Provisioning, scaling, cloning and time travel an ELK cluster demo video

Provisioning, scaling, cloning and time travel an ELK cluster

Robin Systems Videos

In this demo video we show 3 very important lifecycle management operations with Robin Hyper-Converged Kubernetes Platform on a ELK (Elasticsearch, Logstash, Kibana) cluster. Provisioning followed by scaling and then cloning and time travel.

Provisioning, scaling, cloning and time travel an ELK cluster

Robin Systems Videos

The Open Source Elastic Stack

Reliably and securely take data from any source, in any format, and
search, analyze, and visualize it in real time.

ELK Cluster – Elasticsearch, Kibana, Logstash

Elasticsearch

Elasticsearch is a distributed, JSON-based search and analytics engine designed for horizontal scalability, maximum reliability, and easy management.

The Heart of the Elastic Stack

Elasticsearch is a distributed, RESTful search and analytics engine capable of solving a growing number of use cases. As the heart of the Elastic Stack, it centrally stores your data so you can discover the expected and uncover the unexpected.

SPEED

Elasticsearch Is Fast.
Really, Really Fast.

When you get answers instantly, your relationship with your data changes. You can afford to iterate and cover more ground.

Being this fast isn’t easy. We’ve implemented inverted indices with finite state transducers for full-text querying, BKD trees for storing numeric and geo data, and a column store for analytics.

And since everything is indexed, you’re never left with index envy. You can leverage and access all of your data at ludicrously awesome speeds.

Kibana

Kibana gives shape to your data and is the extensible user interface for configuring and managing all aspects of the Elastic Stack.

Your Window into
the Elastic Stack

Kibana lets you visualize your Elasticsearch data and navigate the Elastic Stack, so you can do anything from learning why you’re getting paged at 2:00 a.m. to understanding the impact rain might have on your quarterly numbers.

Logstash

Visualize your data. Navigate the Elastic Stack.

Logstash –

Ingest any data, from any source, in any format.

Logstash is a dynamic data collection pipeline with an extensible plugin ecosystem and strong Elasticsearch synergy.

Centralize, Transform & Stash Your Data

Logstash is an open source, server-side data processing pipeline that ingests data from a multitude of sources simultaneously, transforms it, and then sends it to your favorite “stash.”

QUERY

Be Curious. Ask Your Data Questions of All Kinds.

Elasticsearch lets you perform and combine many types of searches — structured, unstructured, geo, metric — any way you want. Start simple with one question and see where it takes you.

ANALYZE

Step Back and Understand the Bigger Picture.

It’s one thing to find the 10 best documents to match your query. But how do you make sense of, say, a billion log lines? Elasticsearch aggregations let you zoom out to explore trends and patterns in your data.

Share data across two Cloudera clusters

Share data across two cloudera clusters

Robin Systems Videos

In this demo, we will demonstrate how we can share data across two Cloudera clusters with Robin Hyper-Converged Kubernetes Platform

Agile Provisioning

  • Simplify cluster deployment using application-aware manger—provision an entire operational data pipeline within minutes
  • Deploy container-based “virtual clusters” running across commodity servers
  • Automate tasks – create, schedule and operate virtual application clusters
  • Scale-up or scale-out instantaneously to meet application performance demands

Share data – Robin eliminates cluster sprawl by deploying a data pipeline on shared hardware. This also results in better hardware utilization. The key to successful multi-tenancy is the ability to provide performance isolation and dynamic performance controls. The Robin application-aware manager equips each virtual cluster with dynamic QoS controls for every resource that it depends on – CPU, memory, network, and storage. This creates a truly elastic infrastructure that delivers CPU, memory, network and storage resources – both capacity and performance – to an application exactly at the instant it is needed.

Cluster Consolidation and QoS

  • Eliminate cluster sprawl with data pipeline components on the same shared hardware
  • Enable multi-tenancy with performance isolation and dynamic performance controls
  • Leverage dynamic QoS controls for every resource – CPU, memory, network and storage

Robin provides out of the box support for application time travel. Cluster level distributed snapshots at pre-defined intervals can be really useful to restore the entire pipeline or parts of it if anything goes wrong. Robin recommends admins to take snapshots before making any major changes. Whether you are upgrading the software version or making a configuration change make sure to have a snapshot. If anything goes wrong the entire cluster can be restored to the last known snapshot in matter of minutes.

Application Time Travel

  • Take unlimited cluster snapshots
  • Restore or refresh a cluster to any point-in-time using snapshots

Robin for Big Data

Setting up Hadoop cluster in the cloud

Robin Videos

Controlling IOPS in a shared Environment

Controlling IOPS in a shared Environment

Robin Systems Videos

In this video, we demonstrate how easily we can throttle IOPS from an application to address the noisy neighbor problem with Robin Hyper-Converged Kubernetes Platform

Controlling IOPS in a Shared Environment

Robin Systems Videos

nput/output operations per second (IOPS, pronounced eye-ops) is an input/output performance measurement used to characterize computer storage devices like hard disk drives (HDD), solid state drives (SSD), and storage area networks (SAN). Like benchmarks, IOPS numbers published by storage device manufacturers do not directly relate to real-world application performance.[1][2]

Controlling IOPS – Background

To meaningfully describe the performance characteristics of any storage device, it is necessary to specify a minimum of three metrics simultaneously: IOPS, response time, and (application) workload. Absent simultaneous specifications of response-time and workload, IOPS are essentially meaningless. In isolation, IOPS can be considered analogous to “revolutions per minute” of an automobile engine i.e. an engine capable of spinning at 10,000 RPMs with its transmission in neutral does not convey anything of value, however an engine capable of developing specified torque and horsepower at a given number of RPMs fully describes the capabilities of the engine.

In 1999, recognizing the confusion created by industry abuse of IOPS numbers following Intel‘s release of IOmeter, a performance benchmarking tool, the Storage Performance Council developed an industry-standard, peer-reviewed and audited benchmark that has been widely recognized as the only meaningful measurement of storage device IO performance; the SPC-1 benchmark suite[citation needed]. The SPC-1 requires storage vendors to fully characterize their products against a standardized workload closely modeled on ‘real-world’ applications, reporting both IOPS and response-times and with explicit prohibitions and safeguards against ‘cheating’ and ‘benchmark specials’. As such, an SPC-1 benchmark result provides users with complete information about IOPS, response-times, sustainability of performance over time and data integrity checks. Moreover, SPC-1 audit rules require vendors to submit a complete bill-of-materials including pricing of all components used in the benchmark, to facilitate SPC-1 “Cost-per-IOPS” comparisons among vendor submissions.

Among the single-dimension IOPS tools created explicitly by and for benchmarketers, applications, such as Iometer (originally developed by Intel), as well as IOzone and FIO[3]have frequently been used to grossly exaggerate IOPS. Notable examples include Sun (now Oracle) promoting its F5100 Flash array purportedly capable of delivering “1 million IOPS in 1 RU” (Rack Unit). Subsequently, tested on the SPC-1, the same storage device was only capable of delivering 30% of the IOmeter value on the SPC-1.[4][5]

The specific number of IOPS possible in any system configuration will vary greatly, depending upon the variables the tester enters into the program, including the balance of read and write operations, the mix of sequential and random access patterns, the number of worker threads and queue depth, as well as the data block sizes.[1] There are other factors which can also affect the IOPS results including the system setup, storage drivers, OS background operations etc. Also, when testing SSDs in particular, there are preconditioning considerations that must be taken into account.[6]

Performance characteristics and Controlling IOPS

Random access compared to sequential access.

The most common performance characteristics measured are sequential and random operations. Sequential operations access locations on the storage device in a contiguous manner and are generally associated with large data transfer sizes, e.g. 128 kB. Random operations access locations on the storage device in a non-contiguous manner and are generally associated with small data transfer sizes, e.g. 4kB.

The most common performance characteristics are as follows:

MeasurementDescription
Total IOPSTotal number of I/O operations per second (when performing a mix of read and write tests)
Random Read IOPSAverage number of random read I/O operations per second
Random Write IOPSAverage number of random write I/O operations per second
Sequential Read IOPSAverage number of sequential read I/O operations per second
Sequential Write IOPSAverage number of sequential write I/O operations per second

For HDDs and similar electromechanical storage devices, the random IOPS numbers are primarily dependent upon the storage device’s random seek time, whereas, for SSDs and similar solid state storage devices, the random IOPS numbers are primarily dependent upon the storage device’s internal controller and memory interface speeds. On both types of storage devices, the sequential IOPS numbers (especially when using a large block size) typically indicate the maximum sustained bandwidth that the storage device can handle.[1]Often sequential IOPS are reported as a simple MB/s number as follows:

{displaystyle {text{IOPS}}*{text{TransferSizeInBytes}}={text{BytesPerSec}}} (with the answer typically converted to MegabytesPerSec)

Some HDDs will improve in performance as the number of outstanding IOs (i.e. queue depth) increases. This is usually the result of more advanced controller logic on the drive performing command queuing and reordering commonly called either Tagged Command Queuing (TCQ) or Native Command Queuing (NCQ). Most commodity SATA drives either cannot do this, or their implementation is so poor that no performance benefit can be seen.[citation needed] Enterprise class SATA drives, such as the Western Digital Raptor and Seagate Barracuda NL will improve by nearly 100% with deep queues.[7] High-end SCSI drives more commonly found in servers, generally show much greater improvement, with the Seagate Savvio exceeding 400 IOPS—more than doubling its performance.[citation needed]

While traditional HDDs have about the same IOPS for read and write operations, most NAND flash-based SSDs are much slower writing than reading due to the inability to rewrite directly into a previously written location forcing a procedure called garbage collection.[8][9][10] This has caused hardware test sites to start to provide independently measured results when testing IOPS performance.

Newer flash SSDs, such as the Intel X25-E, have much higher IOPS than traditional HDD. In a test done by Xssist, using IOmeter, 4 KB random transfers, 70/30 read/write ratio, queue depth 4, the IOPS delivered by the Intel X25-E 64GB G1 started around 10000 IOPs, and dropped sharply after 8 minutes to 4000 IOPS, and continued to decrease gradually for the next 42 minutes. IOPS vary between 3000 and 4000 from around the 50th minutes onwards for the rest of the 8+ hours test run.[11] Even with the drop in random IOPS after the 50th minute, the X25-E still has much higher IOPS compared to traditional hard disk drives. Some SSDs, including the OCZ RevoDrive 3 x2 PCIe using the SandForce controller, have shown much higher sustained write performance that more closely matches the read speed.[12]

Controlling IOPS in an Oracle Database with Robin

Oracle as a Service on Kubernetes Solution Brief

More Robin Hyper-Converged Kubernetes Platform Demos and Videos

Controlling IOPS Oracle Database

Relational Databases

No Compromise Database Consolidation

[button color=”accent-color” hover_text_color_override=”#fff” size=”large” url=”/solutions/relational-databases/” open_new_tab=”true” text=”Learn More” color_override=””]

More Robin Hyper-Converged Kubernetes Platform Demos and Videos

Managing IOPS with Robin Hyper-Converged Kubernetes Platform

Learn More – Robin Hyper-Converged Kubernetes Platform for big data & databases

Managing IOPS with Robin Systems

Managing IOPs with Robin Hyper-Converged Kubernetes Platform for Big Data & Databases

Allocate the right amount of IOPs for each Application in your data center. Make sure one Application does not hog all the IOPs or majority of the IOPs. Set min and max IOPs for each Application and change them dynamically with Robin Hyper-Converged Kubernetes Platform for big data and databases.

DataStax Cassandra: Provision and Scale Out

DataStax Cassandra Provision and Scale Out

1-click, rapid, self-service Cassandra deployment with Robin Hyper-Converged Kubernetes Platform

  • Build elastic infrastructure that provides all resources to each application as needed
  • Create single-click clone of entire data pipeline
  • Get out-of-the-box 2-way or 3-way replication
  • Create thin clones on the fly without affecting data in production
  • Achieve data sharing pointing HDFS of one cluster to another

It is necessary to scale up or out as demand for resources spikes and then comes back to normal. Robin enables you to scale up with a single click by allocating more resources to the application. Robin enables you to scale out easily when you need to add nodes and helps you clone parts of your data when you need give data to developers and analysts for analytics, test upgrades, testing changes or for integration testing.

More Robin Hyper-Converged Kubernetes Platform Videos and Demos

Cassandra: Snapshot, Clone and Time Travel

More Robin Hyper-Converged Kubernetes Platform demo videos

Cassandra: Snapshot, Clone, and Time Travel

Cassandra Snapshot, Clone, Time-travel

  • Take unlimited cluster snapshots
  • Restore or refresh a cluster to any point-in-time using snapshots

Robin Hyper-Converged Kubernetes Platform provides out of the box support for application time travel. Cluster level distributed snapshots at pre-defined intervals can be really useful to restore the entire pipeline or parts of it if anything goes wrong.

Robin Systems recommends admins to take snapshots before making any major changes. Whether you are upgrading the software version or making a configuration change make sure to have a snapshot. If anything goes wrong the entire cluster can be restored to the last known snapshot in a matter of minutes.

View Demo – Cassandra: Snapshot, Clone and Time Travel

Cassandra: Quality of Service

Robin Hyper-Converged Kubernetes Platform for NoSql databases such as Cassandra

More Robin Hyper-Converged Kubernetes Platform Demos and Videos

Cassandra: Quality of Service Control

Cassandra QoS – Quality of Service

Guaranteed Availability and Performance

  • Eliminate cluster sprawl with data pipeline components on the same shared hardware
  • Enable multi-tenancy with performance isolation and dynamic performance controls
  • Leverage dynamic QoS controls for every resource – CPU, memory, network and storage

Robin eliminates cluster sprawl by deploying a data pipeline on shared hardware. This also results in better hardware utilization. The key to successful multi-tenancy is the ability to provide performance isolation and dynamic performance controls. The Robin application-aware fabric controller equips each virtual cluster with dynamic QoS controls for every resource that it depends on – CPU, memory, network, and storage. This creates a truly elastic infrastructure that delivers CPU, memory, network and storage resources – both capacity and performance – to an application exactly at the instant it is needed.

More Robin Hyper-Converged Kubernetes Platform Demos and Videos