Share data across two Cloudera clusters

Share data across two cloudera clusters

Robin Systems Videos

In this demo, we will demonstrate how we can share data across two Cloudera clusters with Robin Hyper-Converged Kubernetes Platform

Agile Provisioning

  • Simplify cluster deployment using application-aware manger—provision an entire operational data pipeline within minutes
  • Deploy container-based “virtual clusters” running across commodity servers
  • Automate tasks – create, schedule and operate virtual application clusters
  • Scale-up or scale-out instantaneously to meet application performance demands

Share data – Robin eliminates cluster sprawl by deploying a data pipeline on shared hardware. This also results in better hardware utilization. The key to successful multi-tenancy is the ability to provide performance isolation and dynamic performance controls. The Robin application-aware manager equips each virtual cluster with dynamic QoS controls for every resource that it depends on – CPU, memory, network, and storage. This creates a truly elastic infrastructure that delivers CPU, memory, network and storage resources – both capacity and performance – to an application exactly at the instant it is needed.

Cluster Consolidation and QoS

  • Eliminate cluster sprawl with data pipeline components on the same shared hardware
  • Enable multi-tenancy with performance isolation and dynamic performance controls
  • Leverage dynamic QoS controls for every resource – CPU, memory, network and storage

Robin provides out of the box support for application time travel. Cluster level distributed snapshots at pre-defined intervals can be really useful to restore the entire pipeline or parts of it if anything goes wrong. Robin recommends admins to take snapshots before making any major changes. Whether you are upgrading the software version or making a configuration change make sure to have a snapshot. If anything goes wrong the entire cluster can be restored to the last known snapshot in matter of minutes.

Application Time Travel

  • Take unlimited cluster snapshots
  • Restore or refresh a cluster to any point-in-time using snapshots

Robin for Big Data

Setting up Hadoop cluster in the cloud

Robin Videos

Controlling IOPS in a shared Environment

Controlling IOPS in a shared Environment

Robin Systems Videos

In this video, we demonstrate how easily we can throttle IOPS from an application to address the noisy neighbor problem with Robin Hyper-Converged Kubernetes Platform

Controlling IOPS in a Shared Environment

Robin Systems Videos

nput/output operations per second (IOPS, pronounced eye-ops) is an input/output performance measurement used to characterize computer storage devices like hard disk drives (HDD), solid state drives (SSD), and storage area networks (SAN). Like benchmarks, IOPS numbers published by storage device manufacturers do not directly relate to real-world application performance.[1][2]

Controlling IOPS – Background

To meaningfully describe the performance characteristics of any storage device, it is necessary to specify a minimum of three metrics simultaneously: IOPS, response time, and (application) workload. Absent simultaneous specifications of response-time and workload, IOPS are essentially meaningless. In isolation, IOPS can be considered analogous to “revolutions per minute” of an automobile engine i.e. an engine capable of spinning at 10,000 RPMs with its transmission in neutral does not convey anything of value, however an engine capable of developing specified torque and horsepower at a given number of RPMs fully describes the capabilities of the engine.

In 1999, recognizing the confusion created by industry abuse of IOPS numbers following Intel‘s release of IOmeter, a performance benchmarking tool, the Storage Performance Council developed an industry-standard, peer-reviewed and audited benchmark that has been widely recognized as the only meaningful measurement of storage device IO performance; the SPC-1 benchmark suite[citation needed]. The SPC-1 requires storage vendors to fully characterize their products against a standardized workload closely modeled on ‘real-world’ applications, reporting both IOPS and response-times and with explicit prohibitions and safeguards against ‘cheating’ and ‘benchmark specials’. As such, an SPC-1 benchmark result provides users with complete information about IOPS, response-times, sustainability of performance over time and data integrity checks. Moreover, SPC-1 audit rules require vendors to submit a complete bill-of-materials including pricing of all components used in the benchmark, to facilitate SPC-1 “Cost-per-IOPS” comparisons among vendor submissions.

Among the single-dimension IOPS tools created explicitly by and for benchmarketers, applications, such as Iometer (originally developed by Intel), as well as IOzone and FIO[3]have frequently been used to grossly exaggerate IOPS. Notable examples include Sun (now Oracle) promoting its F5100 Flash array purportedly capable of delivering “1 million IOPS in 1 RU” (Rack Unit). Subsequently, tested on the SPC-1, the same storage device was only capable of delivering 30% of the IOmeter value on the SPC-1.[4][5]

The specific number of IOPS possible in any system configuration will vary greatly, depending upon the variables the tester enters into the program, including the balance of read and write operations, the mix of sequential and random access patterns, the number of worker threads and queue depth, as well as the data block sizes.[1] There are other factors which can also affect the IOPS results including the system setup, storage drivers, OS background operations etc. Also, when testing SSDs in particular, there are preconditioning considerations that must be taken into account.[6]

Performance characteristics and Controlling IOPS

Random access compared to sequential access.

The most common performance characteristics measured are sequential and random operations. Sequential operations access locations on the storage device in a contiguous manner and are generally associated with large data transfer sizes, e.g. 128 kB. Random operations access locations on the storage device in a non-contiguous manner and are generally associated with small data transfer sizes, e.g. 4kB.

The most common performance characteristics are as follows:

Measurement Description
Total IOPS Total number of I/O operations per second (when performing a mix of read and write tests)
Random Read IOPS Average number of random read I/O operations per second
Random Write IOPS Average number of random write I/O operations per second
Sequential Read IOPS Average number of sequential read I/O operations per second
Sequential Write IOPS Average number of sequential write I/O operations per second

For HDDs and similar electromechanical storage devices, the random IOPS numbers are primarily dependent upon the storage device’s random seek time, whereas, for SSDs and similar solid state storage devices, the random IOPS numbers are primarily dependent upon the storage device’s internal controller and memory interface speeds. On both types of storage devices, the sequential IOPS numbers (especially when using a large block size) typically indicate the maximum sustained bandwidth that the storage device can handle.[1]Often sequential IOPS are reported as a simple MB/s number as follows:

{displaystyle {text{IOPS}}*{text{TransferSizeInBytes}}={text{BytesPerSec}}} (with the answer typically converted to MegabytesPerSec)

Some HDDs will improve in performance as the number of outstanding IOs (i.e. queue depth) increases. This is usually the result of more advanced controller logic on the drive performing command queuing and reordering commonly called either Tagged Command Queuing (TCQ) or Native Command Queuing (NCQ). Most commodity SATA drives either cannot do this, or their implementation is so poor that no performance benefit can be seen.[citation needed] Enterprise class SATA drives, such as the Western Digital Raptor and Seagate Barracuda NL will improve by nearly 100% with deep queues.[7] High-end SCSI drives more commonly found in servers, generally show much greater improvement, with the Seagate Savvio exceeding 400 IOPS—more than doubling its performance.[citation needed]

While traditional HDDs have about the same IOPS for read and write operations, most NAND flash-based SSDs are much slower writing than reading due to the inability to rewrite directly into a previously written location forcing a procedure called garbage collection.[8][9][10] This has caused hardware test sites to start to provide independently measured results when testing IOPS performance.

Newer flash SSDs, such as the Intel X25-E, have much higher IOPS than traditional HDD. In a test done by Xssist, using IOmeter, 4 KB random transfers, 70/30 read/write ratio, queue depth 4, the IOPS delivered by the Intel X25-E 64GB G1 started around 10000 IOPs, and dropped sharply after 8 minutes to 4000 IOPS, and continued to decrease gradually for the next 42 minutes. IOPS vary between 3000 and 4000 from around the 50th minutes onwards for the rest of the 8+ hours test run.[11] Even with the drop in random IOPS after the 50th minute, the X25-E still has much higher IOPS compared to traditional hard disk drives. Some SSDs, including the OCZ RevoDrive 3 x2 PCIe using the SandForce controller, have shown much higher sustained write performance that more closely matches the read speed.[12]

Controlling IOPS in an Oracle Database with Robin

Oracle as a Service on Kubernetes Solution Brief

More Robin Hyper-Converged Kubernetes Platform Demos and Videos

Controlling IOPS Oracle Database

Relational Databases

No Compromise Database Consolidation

[button color=”accent-color” hover_text_color_override=”#fff” size=”large” url=”/solutions/relational-databases/” open_new_tab=”true” text=”Learn More” color_override=””]

More Robin Hyper-Converged Kubernetes Platform Demos and Videos

Managing IOPS with Robin Hyper-Converged Kubernetes Platform

Learn More – Robin Hyper-Converged Kubernetes Platform for big data & databases

Managing IOPS with Robin Systems

Managing IOPs with Robin Hyper-Converged Kubernetes Platform for Big Data & Databases

Allocate the right amount of IOPs for each Application in your data center. Make sure one Application does not hog all the IOPs or majority of the IOPs. Set min and max IOPs for each Application and change them dynamically with Robin Hyper-Converged Kubernetes Platform for big data and databases.

DataStax Cassandra: Provision and Scale Out

DataStax Cassandra Provision and Scale Out

1-click, rapid, self-service Cassandra deployment with Robin Hyper-Converged Kubernetes Platform

  • Build elastic infrastructure that provides all resources to each application as needed
  • Create single-click clone of entire data pipeline
  • Get out-of-the-box 2-way or 3-way replication
  • Create thin clones on the fly without affecting data in production
  • Achieve data sharing pointing HDFS of one cluster to another

It is necessary to scale up or out as demand for resources spikes and then comes back to normal. Robin enables you to scale up with a single click by allocating more resources to the application. Robin enables you to scale out easily when you need to add nodes and helps you clone parts of your data when you need give data to developers and analysts for analytics, test upgrades, testing changes or for integration testing.

More Robin Hyper-Converged Kubernetes Platform Videos and Demos

Cassandra: Snapshot, Clone and Time Travel

More Robin Hyper-Converged Kubernetes Platform demo videos

Cassandra: Snapshot, Clone, and Time Travel

Cassandra Snapshot, Clone, Time-travel

  • Take unlimited cluster snapshots
  • Restore or refresh a cluster to any point-in-time using snapshots

Robin Hyper-Converged Kubernetes Platform provides out of the box support for application time travel. Cluster level distributed snapshots at pre-defined intervals can be really useful to restore the entire pipeline or parts of it if anything goes wrong.

Robin Systems recommends admins to take snapshots before making any major changes. Whether you are upgrading the software version or making a configuration change make sure to have a snapshot. If anything goes wrong the entire cluster can be restored to the last known snapshot in a matter of minutes.

View Demo – Cassandra: Snapshot, Clone and Time Travel

Cassandra: Quality of Service

Robin Hyper-Converged Kubernetes Platform for NoSql databases such as Cassandra

More Robin Hyper-Converged Kubernetes Platform Demos and Videos

Cassandra: Quality of Service Control

Cassandra QoS – Quality of Service

Guaranteed Availability and Performance

  • Eliminate cluster sprawl with data pipeline components on the same shared hardware
  • Enable multi-tenancy with performance isolation and dynamic performance controls
  • Leverage dynamic QoS controls for every resource – CPU, memory, network and storage

Robin eliminates cluster sprawl by deploying a data pipeline on shared hardware. This also results in better hardware utilization. The key to successful multi-tenancy is the ability to provide performance isolation and dynamic performance controls. The Robin application-aware fabric controller equips each virtual cluster with dynamic QoS controls for every resource that it depends on – CPU, memory, network, and storage. This creates a truly elastic infrastructure that delivers CPU, memory, network and storage resources – both capacity and performance – to an application exactly at the instant it is needed.

More Robin Hyper-Converged Kubernetes Platform Demos and Videos