ROBIN in Action

Delivering Big Data, Databases and AI/ML as a Service

Scenario one - Deploying a complex application stack on Kubernetes, for eg. Hortonworks (HDP)

You love the agility of the containerized infrastructure. You are already containerizing your stateless applications and microservices. You love Kubernetes. You are wondering whether you can use Kubernetes to bring the same agility to distributed data-intensive applications such as big data, databases, and AI/ML.

Without ROBIN, it seems like an impossible dream. You have to scope the deployment, create IT tickets for provisioning hardware, install Kubernetes cluster, find/build the correct containers for each service in the stack, deploy each container separately, and make sure these containers work seamlessly together. And even if you can pull all this off, you are only at day zero – deployment. What if you want to snapshot, backup, clone, scale, upgrade, patch and migrate your big data and database applications?

Once you have ROBIN hyper-converged Kubernetes platform, deploying big data, databases, and AI/ML on Kubernetes is a breeze. Simply log in to be greeted by an app-store experience, click on your favorite application, adjust the compute and storage resources if you want (you don’t have to) and relax. Your application will be deployed and ready to use within minutes.

Scenario two: Application time-travel on Kubernetes

Life with Kubernetes is wonderful. Then you realize you made a mistake and deleted important application data. You wish you could go back in time. Once you have ROBIN, you can.

Without ROBIN, you may not be able to recover from your mistake. If you are using third-party storage solutions, you hope they provide storage level snapshots. If you are lucky, they might, but you are left to figure out how to restore the application state using storage level snapshots.

Once you have ROBIN, you can create a snapshot of your entire application’s state, not just data with 1-click. ROBIN uses the redirect-on-write method, which means snapshots are created within sub-seconds and they don’t eat into your storage capacity.

Snapshots allow you to restore your applications’ state to a point-in-time. So if you make a mistake, you can simply undo it by restoring a snapshot. With ROBIN you can seamlessly travel between different application states (backward and forward), even if the application’s topology and configuration has changed over time. With ROBIN’s application time-travel your developers are free to run what-if analysis quickly and collaborate freely with other teams with the push of a button.

Scenario three: Scaling up Applications

You have a smooth running big data application – let’s say Hortonworks – in production. But you have lately observed the NameNode is always over 90% utilization. You want to do something about it. With ROBIN, it is as easy as adjusting the brightness of your smartphone.

Without ROBIN, you’ll have to first bring down your Hortonworks cluster for maintenance. Recreate your NameNode configuration in another container. Copy data from the original NameNode to the new one. Manually modify the Hortonworks configuration to remove the old NameNode, followed by manually adding the new one. Then bring up your Hortonworks cluster. All this while your developers don’t have access to their Hortonworks cluster.

With ROBIN, you can simply slide a slider to increase or decrease CPU, memory and storage IOPs resources to any container, such as the under duress NameNode in this example, using 1-click operations. Life couldn’t be simpler! Just as easily as you can increase resources, while the application is running, you can also decrease them when the need goes down.

Scenario four: Scaling-out Applications

You like to start small and add resources as you need for new application deployments. Your Hortonworks cluster needs to grow due to expanding usage. You want to add more DataNodes. With ROBIN, it is as easy as adjusting the brightness of your smartphone.

Without ROBIN, you’ll have to bring up new containers with DataNode image, attach storage volumes to it. Write scripts to register the new containers as DataNodes with Ambari so YARN and HDFS are aware of the new addition. But wait, what if your DataNode requires data-locality (i.e., the storage volumes should reside on the same node where the container is brought up to minimize network hops and increase performance), how would you ensure this in Kubernetes which doesn’t understand and orchestrate storage? And what if you placed your original DataNodes across multiple racks to tolerate full rack failures. You’ll have to write up complex label selectors in Kubernetes to ensure that that constraint is not violated when you add the new DataNode. This will take you hours if not days. That’s for adding one DataNode. Now imagine the pain if you had to grow your Hortonworks cluster by adding a full-rack worth of new DataNodes at a time!

With ROBIN, you can simply add more worker nodes using 1-click operations. Because ROBIN understands your application it will learn the data-locality, affinity and anti-affinity constraints from the running Hortonworks cluster and automatically bring up the new DataNodes in compliance with those policies. ROBIN’s also automatically registers the new DataNode with your running Hortonworks cluster. All with a single click. Life couldn’t be simpler!

Scenario five: Storage Persistency

You are a big believer in shared architecture and multi-tenancy. Your multiple data-intensive applications running on shared infrastructure resources. It saves you significant costs but creates a few challenges, as some applications are more important than others.

Without ROBIN, running multiple data-heavy applications on the same physical hardware means you suffer from noisy neighbors. Which means your performance cannot be guaranteed. One user running a large query in one database can severely degrade the performance of other databases that might be running on the same physical host. Because Kubernetes can’t guarantee storage IOPs isolation there is no way to work around this.

With ROBIN, it is as easy as setting consumption quotas for each application or user. Simply go to the QoS screen of the applications, and set minimum and/or maximum IOPs for each application. ROBIN will take care of the rest. And the best thing is that this can be done while the application is running. No application downtime to dial up or down the performance of your application. Now that is nirvana!