For many IT folks, containers are merely another flavor of virtualization, and they often make the mistake of assuming that storage for containers is no different from storage for virtual machines (VMs). But that is not the case. To understand the different requirements containers and VMs have for storage, one must fully comprehend the fundamental differences between these two architectures.
Virtualization refers to the act of creating a virtual (rather than actual) version of something, including virtual computer hardware platforms, operating systems, storage devices, and computer network resources. (Source: Wikipedia )
Virtual machines are neither nimble nor simple enough to support applications across large numbers of nodes. Hence the storage solutions that were built for virtual machines continue to be monolithic in nature, unable to support scale and lacking any understanding of the application running on top of them.
On the other hand, a container consists of an entire runtime environment: an application along with all its dependencies, libraries, and other binaries, and configuration files needed to run it, bundled into one package.
As a result, containers are much lighter than virtual machines and thus can be instantiated and scaled very quickly. One of the primary reasons for the popularity of container technology (e.g. Docker) is the ability to provision and scale applications across hundreds of nodes. This approach to storage solutions is very different from that of virtualization.
Containers require a distributed storage system that is natively elastic and can support creation and deletion of storage volumes at the rate of knots. Traditional storage systems are simply not designed to operate at the rate at which containers scale.
When designing the capabilities for the storage layer within the ROBIN Hyper-Converged Kubernetes Platform, we made sure to build in a number of container-aware critical features:
Application Data Placement
Distributed applications (such as NoSQL, Big Data etc.) require Rack-, Node-, Disk- and Subnet-level isolation of the data stored by each of the different application services. When running within containers, different partitions/shards of a NoSQL database shouldn’t coexist on the same disks or nodes. On the other hand, to minimize network load, some applications such as Hadoop/Hbase require collocation of data on the same node as compute. It is also best to run map-reduce jobs on the same nodes on which data resides.
Traditional storage products aren’t capable of understanding these application-level policies, hence such enforcement is not possible. Robin’s application-aware storage stack exposes several fine-grained controls that enforce these policies based on the type of application being deployed.
Application Lifecycle Management
Snapshots, clones and patching of multi-tiered and distributed applications require complex coordination between containers, network and storage stack. Just being consistent at the storage-volume level is not sufficient to guarantee application consistency. Traditional storage products are only capable of creating individual volume-level snapshots/clones. Robin’s storage layer is application aware and it completely understands an application’s topology. Hence, with just a click of a button one can create application-consistent snapshots and clones for any application (even those that don’t have built-in snapshotting and cloning capabilities). This Robin capability also extends to safe patching of an application and recovering from any failures via our unique “application time travel” feature.
Here is a demo of Cassandra snapshot clone and time travel.
ROBIN Hyper-Converged Kubernetes Platform is container-aware and tailor-made for consolidation of disparate applications. Bringing performance predictability to a consolidated environment requires end-to-end enforcement of QoS policies. Most incumbent storage products were designed in the monolithic era, when application consolidation was not very common. Hence QoS guarantees are either not implemented or are very rudimentary and not capable of functioning in the container-era. Robin is the only product that can guarantee both MAX and MIN IOPS at the container-level and not just at the storage-volume level. Max IOPS cap prevents a single application from hogging resources and Min IOPS guarantee ensures predictable application performance.
As Robin storage is application aware it can distinguish IO traffic between original and cloned volumes and automatically enforce different priorities to each. Hence any application can be quickly thin-cloned without adversely affecting the performance of the original application.
Here is a demo of Robin QoS control.
Troubleshooting IO-related problems requires end-to-end insight of the IO pipeline — all the way from the application, where IO originates, to disks, where data resides. No storage product in the market today is capable of this insight because the products lack any presence on the compute side of the IO pipeline. Robin is the only infrastructure product that provides complete bi-directional insight. Users can map each IO all the way from the container running an application through the physical host on which the container is running, via the network ports used to reach the storage node and the disk on that node where data is persisted. In the reverse direction, each disk provides an up-to-date insight on which application has what type of data stored on that disk, and how much of it is currently in use. Together, both these insights help in quickly identifying IO hotspots and generating chargeback & compliance reports.
Acknowledgement: Thanks to Partha Seetala, CTO, Robin Systems, for his inputs on this blog post.