The Need for Data Management on Kubernetes

Kubernetes is gaining rapid adoption and enterprise customers are demanding the ability to run broader sets of workloads including stateful applications. Running stateful applications such as PostgreSQL, MySQL, MongoDB, Elastic Stack, Kafka, and MariaDB require advanced data management capabilities in order to:

  • Release new products and features faster: Automated lifecycle management for app+data (not just the storage) is required to save valuable time at each stage of the lifecycle.
  • Collaborate quickly across teams: Multiple teams (Dev/Test/Ops) need a mechanism to collaborate without procedural delays. CI/CD pipelines solve a part of the problem with automating the collaboration for code changes, but data is usually left out.
  • Recover from system failures and user errors: App+data protection capabilities such as point-in-time snapshots, backup, and restore are required to recover from system failures and user errors.
  • Avoid infrastructure lock-in: The ability to migrate from on-prem to cloud and vice versa, and among the public clouds is needed to avoid infrastructure lock-in.
  • Deliver predictable performance: To guarantee QoS and to ensure high priority applications do not miss SLAs, you need the ability to set IOPS limits per app.
  • Eliminate security vulnerabilities: Enterprise-grade security is required with authentication and encryption to ensure your data is safe.

Defining and Managing An Application

Kubernetes provides many useful constructs such as Pods, Controllers, PersistentVolumes etc. to help you manage your applications. However, there is no construct for an “Application”, i.e. a single entity that consists of all the resources that form an application. Users have to manually map the resources to an application and manage each resource individually for any lifecycle operation. The lack of a proper Application construct in Kubernetes poses a problem when it comes to performing operations that encompass a group of resources.

Frameworks such as Helm and Operators try to solve this problem by packaging resources together, but they do not solve it beyond the initial deployment. For example, how would one snapshot, clone or backup an entire helm release that spans PersistVolumeClaims, Secrets, ConfigMaps, StatefulSet, Pods, Services etc? Or how about snapshotting a web-tier, app-tier and database-tier each deployed separately using 3 different kubectl manifest files?

Advanced Data Management for Kubernetes White Paper