Elastisys Engineering: How to set up Rook with Ceph on Kubernetes

Stateful data is stored on Persistent Volumes in Kubernetes. These can be either statically or dynamically provisioned. But only if a suitable Storage Class has been defined. This is a no-brainer in most public clouds, because they offer a block storage service such as AWS EBS. But what can you do to support e.g. bare metal deployments of Kubernetes, where no such block storage service is available?

Enter Rook. Rook is a storage service that provides a Kubernetes-compatible API. It sits as an abstraction layer between Kubernetes and an underlying storage service. In this blog post, we will set up virtual machines that will run Kubernetes and also run the Ceph storage service. Ceph is then made available to Kubernetes via Rook.

The result? A fully functioning Kubernetes cluster that can dynamically provision Persistent Volumes. Please note that although we use a cloud environment here to start the virtual machines, everything in this article works just as well on bare metal servers.

Overview

It is assumed that you can use cloud-init to configure your nodes.

Our cluster setup is as following:

One control plane node: 2GB RAM, 2 vCPU, 50GB local storage.
Three worker nodes: 8GB RAM, 4 vCPU, 100GB local storage each.

All nodes are running Ubuntu 20.04 LTS.
The cluster is running Kubernetes v1.18.10 and is installed using Kubespray 2.14.2.

Infrastructure preparation

Before deploying Kubernetes and Rook Ceph, we have to decide how to provide the storage for the Rook Ceph cluster, and prepare the nodes accordingly.

Choosing local storage option

Let’s start by looking at the Rook and the Ceph prerequisites.
In our case, what we need to decide on is which local storage option to provide:

Raw devices (no partitions or formatted filesystems)
Raw partitions (no formatted filesystem)
PVs available from a storage class in block mode

On the cloud provider used for this example, it is only possible to use one device per node – the boot disk.
Using a separate disk for Rook Ceph is therefore not an option.

A raw partition could be created during boot using cloud-init.
Considering that Rook Ceph clusters can discover raw partitions by itself, and we would have to create block mode PVs (PersistentVolumes) ourselves in order to use them, we will go with raw partitions.

Implementing local storage option

For simplicity, we will provide Ceph storage on all worker nodes.
Further configuration to specify which nodes to consider for storage discovery can be done in the Ceph Cluster CRD.

To create the raw partition on the worker nodes, they should have the following cloud-init config:

#cloud-config
bootcmd:
- [ cloud-init-per, once, move-second-header, sgdisk, --move-second-header, /dev/vda ]
- [ cloud-init-per, once, create-ceph-part, parted, --script, /dev/vda, 'mkpart 2 50GB -1' ]

In case you struggle to understand the commands above, fear not! We explained partitioning via cloud-init in our previous blog post.

Note the mkpart command.
It will create a raw partition with start at 50GB, stretching until next partition or end of disk.
Change start from 50GB to something larger if you want to reserve more space for other partitions, such as the root partition.

Verify that the nodes have an empty partition:

# On the worker nodes
$ sudo parted -l
Model: Virtio Block Device (virtblk)
Disk /dev/vda: 107GB
Sector size (logical/physical): 512B/512B
Partition Table: gpt
Disk Flags:

Number  Start   End     Size    File system  Name  Flags
14      1049kB  5243kB  4194kB                     bios_grub
15      5243kB  116MB   111MB   fat32              boot, esp
 1      116MB   50.0GB  49.9GB  ext4
 2      50.0GB  107GB   57.4GB               2

See partition 2. It does not have a file system, and is leaving ~50GB for other partitions, as desired from our cloud-init configuration.

Now that the nodes are prepared, it is time to deploy the Kubernetes cluster.
The Kubernetes installation will not be covered in detail.
We will install a vanilla Kubernetes cluster using Kubespray, but most installers should do.

Deploying Rook

Now that the worker nodes are prepared with a raw partition and Kubernetes is deployed, it is time to deploy the Rook Operator.
At this point it is totally fine to follow the Ceph Quickstart, but we will use the Ceph Operator Helm chart instead.
The examples are based on using Helm v3.

Create the namespace for Rook and deploy the Helm chart:

helm repo add rook-release https://charts.rook.io/release

kubectl create namespace rook-ceph

helm install rook-ceph rook-release/rook-ceph \
    --namespace rook-ceph  \
    --version v1.5.3

See that the operator pod starts successfully:

$ kubectl --namespace rook-ceph get pods
NAME                                 READY   STATUS    RESTARTS   AGE
rook-ceph-operator-664d8997f-lttxz   1/1     Running   0          33s

The Rook repository provides some example manifests for Ceph clusters and StorageClasses.
In this case, we will deploy the sample production Ceph cluster cluster.yaml.
Note that this requires at least three worker nodes – if you have fewer nodes in your cluster, use cluster-test.yaml (NOT RECOMMENDED FOR PRODUCTION).

For storage class, we will go with the sample RBD storageclass.yaml.
Note the less demanding but also less reliable storageclass-test.yaml, if you are only testing this out.

Deploy the Ceph cluster and the storage class:

kubectl --namespace rook-ceph apply -f cluster.yaml
kubectl --namespace rook-ceph apply -f storageclass.yaml

Give the Ceph cluster a few minutes to get ready.
You can check the status of the cluster by running:

$ kubectl --namespace rook-ceph get cephclusters.ceph.rook.io
NAME       ...    PHASE   MESSAGE                        HEALTH
rook-ceph  ...    Ready   Cluster created successfully   HEALTH_OK

At this stage, we had an issue where the Ceph cluster failed to reach a healthy state.
After some debugging we came to the conclusion that our issue was caused by an incomplete deployment of cert-manager, causing the Kubernetes API server to be unable to respond to requests from the Rook Operator.
Make sure your Kubernetes clusters are in a healthy state before deploying Rook!

Creating and consuming a PersistentVolumeClaim

Once the Ceph cluster is ready, we can create the sample PersistentVolumeClaim (PVC) and see that Rook Ceph creates a PersistentVolume (PV) for it to bind to:

$ kubectl create -f pvc.yaml
persistentvolumeclaim/rbd-pvc created

$ kubectl get pvc
NAME      STATUS   VOLUME                                     CAPACITY   ACCESS MODES   STORAGECLASS      AGE
rbd-pvc   Bound    pvc-c3af7bd1-d277-4475-8e03-d87beb719e75   1Gi        RWO            rook-ceph-block   113s

and finally consume the PVC with a Pod:

$ kubectl create -f pod.yaml
pod/csirbd-demo-pod created

$ kubectl get pods
NAME              READY   STATUS    RESTARTS   AGE
csirbd-demo-pod   1/1     Running   0          21s

By describing the pod, we see that it is using the PVC:

$ kubectl describe pod csirbd-demo-pod
Containers:
  web-server:
    ...
    Mounts:
      /var/lib/www/html from mypvc (rw)
...
Volumes:
  mypvc:
    Type:       PersistentVolumeClaim (a reference to a PersistentVolumeClaim in the same namespace)
    ClaimName:  rbd-pvc
    ReadOnly:   false
...

Further tweaking

These manifests are only sample manifests, and not guaranteed to fit your infrastructure perfectly.
Look through the different storage types, the Ceph cluster configuration and the storage class configuration to see what would fit your use case best.

Monitoring

Rook Ceph comes with Prometheus support.
This requires Prometheus Operator to be deployed, which we will not cover here.
Once Prometheus Operator is installed, monitoring can be enabled per Rook Ceph cluster by setting spec.monitoring.enabled=true in the CephCluster CR (cluster.yaml in our example).
The manifest can be safely reapplied after changing this value, and the Rook Operator will create the corresponding ServiceMonitor.

There are some Grafana dashboards referred to in the Rook documentation utilizing the metrics exposed by the Rook Ceph clusters, created by @galexrt:

Cleanup

Since Rook Ceph expects raw devices on the nodes it runs on, redeploying a cluster is not entirely straightforward (unless you can throw away and recreate the worker nodes).
For more detail, see the Rook Ceph Cleanup documentation.

Author: Elastisys

Elastisys is a Swedish cloud-native company on a mission to safeguard the digital backbone of society. Founded in 2011 and built on decades of research and industry expertise, we help organizations run software critical to society – securely, reliably, and in full regulatory compliance. Through our application platform, Welkin by Elastisys, and a suite of expert services, we enable teams in critical sectors to innovate with confidence. Trusted by industry leaders and rooted in European values, Elastisys is shaping the future of secure, sovereign digital infrastructure.