Running Apache Cassandra on Kubernetes

Image by:

As Kubernetes becomes the de facto solution for container orchestration, more and more developers (and enterprises) want to run Apache Cassandra databases on Kubernetes. It's easy to get started—especially given the capabilities that Kubernetes' StatefulSets bring to the table. Kubernetes, though, certainly has room to improve when it comes storing data in-state and understanding how different databases work.

For example, Kubernetes doesn't know if you're writing to a leader or a follower database, or to a multi-sharded leader infrastructure, or to a single database instance. StatefulSets—workload API objects used to manage stateful applications—offer the building blocks required for stable, unique network identifiers; stable persistent storage; ordered and smooth deployment and scaling, deletion, and termination; and automated rolling updates. However, while getting started with Cassandra on Kubernetes might be easy, it can still be a challenge to run and manage.

To overcome some of these hurdles, we decided to build an open source Cassandra operator that runs and operates Cassandra within Kubernetes; you can think of it as Cassandra-as-a-Service on top of Kubernetes. We've made this Cassandra operator open source and freely available on GitHub. It remains a work in progress by our Instaclustr team and our partner contributors—but it is functional and ready for use. The Cassandra operator supports Docker images, which are open source and also available from the project's GitHub repository.

The Cassandra operator is designed to provide "operations-free" Cassandra: it takes care of deployment and allows users to manage and run Cassandra—in a safe way—within Kubernetes environments. It also makes it simple to utilize consistent and reproducible environments.

While it's possible for developers to build scripts for managing and running Cassandra on Kubernetes, the Cassandra operator offers the advantage of providing the same consistent, reproducible environment, as well as the same consistent, reproducible set of operations through different production clusters. (This is true across development, staging, and QA environments.) Also, because best practices are already built into the operator, development teams are spared operational concerns and can focus on their core capabilities.

What is a Kubernetes operator?

A Kubernetes operator consists of two components: a controller and a custom resource definition (CRD). The CRD allows devs to create Cassandra objects in Kubernetes. It's an extension of Kubernetes that allows us to define custom objects or resources using Kubernetes that our controller can then listen to for any changes to the resource definition. Devs can define an object in Kubernetes that contains configuration options for Cassandra, such as cluster name, node count, JVM tuning options, etc.—all the information you want to give Kubernetes about how to deploy Cassandra.

You can isolate the Cassandra operator to a specific Kubernetes namespace, define what kinds of persistent volumes it should use, and more. The Cassandra operator's controller listens to state changes on the Cassandra CRD and will create its own StatefulSets to match those requirements. It will also manage those operations and can ensure repairs, backups, and safe scaling as specified via the CRD. In this way, it leverages the Kubernetes concept of building controllers upon other controllers in order to achieve intelligent and helpful behaviors.

So, how does it work?

Architecturally, the Cassandra controller connects to the Kubernetes Master. It listens to state changes and manipulates pod definitions and CRDs. It then deploys them, waits for changes to occur, and repeats until all necessary changes complete fully.

The Cassandra controller can, of course, perform operations within the Cassandra cluster. For example, want to scale down your Cassandra cluster? Instead of manipulating the StatefulSet to handle this task, the controller will see the CRD change. The node count will change to a lower number (say from six to five). The controller will get that state change, and it will first run a decommission operation on the Cassandra node that will be removed. This ensures that the Cassandra node stops gracefully and redistributes and rebalances the data it holds across the remaining nodes. Once the Cassandra controller sees this has happened successfully, it will modify that StatefulSet definition to allow Kubernetes to decommission that pod. Thus, the Cassandra controller brings needed intelligence to the Kubernetes environment to run Cassandra properly and ensure smoother operations.

As we continue this project and iterate on the Cassandra operator, our goal is to add new components that will continue to expand the tool's features and value. A good example is Cassandra SideCar (shown in the diagram above), which can take responsibility for tasks like backups and repairs. Current and future features of the project can be viewed on GitHub. Our goal for the Cassandra operator is to give devs a powerful, open source option for running Cassandra on Kubernetes with a simplicity and grace that has not yet been all that easy to achieve.