Building Big Data Service using Containers

Businesses cannot ignore the immense growth of cloud and mobile adoption, as well as IoT data & Business Intelligence that translates to never-seen-before data volumes, velocity, variety and distribution. Traditional IT silos, often bound by hardware and utilization, do not adapt well to the rapid DevOps-driven change that has completely altered how applications are defined, deployed and managed throughout their lifecycles. Cloud adoption is not open to debate anymore; the challenge is how to implement applications both on premises and in the cloud, i.e. Hybrid Cloud implementation. The ability to tailor infrastructure to the needs of modern applications is limited, and while some point solutions address some challenges, there is no single coherent solution making it all work in concert.

The complexity of clustered and distributed applications such as Hadoop, Cassandra or MongoDB make it difficult for IT to provision, run and optimize, often due to fragmented management — infrastructure management software usually layered over the hardware silos adds to this complexity.  Given the lack of automation, scripting is used heavily, which often requires never-ending maintenance for managing underlying components.

ROBIN Hyper-Converged Kubernetes Platform has unified the various software-defined components into a modern and coherent application-defined infrastructure software optimized for container technology with the following design goals to build big data service using containers:

Productivity improvement by simplified operations and user experience

Cost reduction by guaranteed performance, even in shared multi-tenant environments, to enable hardware consolidation

Risk reduction by repeatable and automated processes such as 1-click cluster provision

Agility optimization with full Application Lifecycle Management with significantly reduced risk


Infrastructure & application automation are misaligned: This results in endless coordination across teams provisioning & aligning infrastructure resources with the requirements for each app. Continuous Deployment (CD) & Integration (CI) automation is still evolving – and this affects business agility.

Software-defined APIs differ across the application IO path: Both the infrastructure and application teams manually script workflows specific to each use case, and need separate tools and skill sets for different infrastructure technologies. These gaps place a burden on business agility and drive up costs even more.

Applications are bound to specific resources:  Because there is no way to guarantee IOPS in a multi-tenant shared environment, more often than not, utilization is low in anticipation of peak times, noisy neighbor or denial of service. Workflows are app-specific manual scripts.

Application lifecycle management broken across the various tiers and teams: For example, there is no easy way to clone an entire application.  The DB admin needs to know which tables need to be cloned, the application admin then needs to freeze the application state to clone, and the storage admin needs to know which volumes are relevant, as well as free/set cloning space.  Nothing is automated, resulting in an uncoordinated mess.

Building Big Data as a Service Using Containers white paper –