Stateful data is stored on Persistent Volumes in Kubernetes. These can be either statically or dynamically provisioned. But only if a suitable Storage Class has been defined. This is a no-brainer in most public clouds, because they offer a block storage service such as AWS EBS. But what can you do to support e.g. bare metal deployments of Kubernetes, where no such block storage service is available?
Enter Rook. Rook is a storage service that provides a Kubernetes-compatible API. It sits as an abstraction layer between Kubernetes and an underlying storage service. In this blog post, we will set up virtual machines that will run Kubernetes and also run the Ceph storage service. Ceph is then made available to Kubernetes via Rook.
The result? A fully functioning Kubernetes cluster that can dynamically provision Persistent Volumes. Please note that although we use a cloud environment here to start the virtual machines, everything in this article works just as well on bare metal servers.
It is assumed that you can use cloud-init to configure your nodes.
Our cluster setup is as following:
- One control plane node: 2GB RAM, 2 vCPU, 50GB local storage.
- Three worker nodes: 8GB RAM, 4 vCPU, 100GB local storage each.
All nodes are running Ubuntu 20.04 LTS.
The cluster is running Kubernetes v1.18.10 and is installed using Kubespray 2.14.2.
Before deploying Kubernetes and Rook Ceph, we have to decide how to provide the storage for the Rook Ceph cluster, and prepare the nodes accordingly.
Choosing local storage option
- Raw devices (no partitions or formatted filesystems)
- Raw partitions (no formatted filesystem)
- PVs available from a storage class in block mode
On the cloud provider used for this example, it is only possible to use one device per node – the boot disk.
Using a separate disk for Rook Ceph is therefore not an option.
A raw partition could be created during boot using
Considering that Rook Ceph clusters can discover raw partitions by itself, and we would have to create block mode PVs (PersistentVolumes) ourselves in order to use them, we will go with raw partitions.
Implementing local storage option
For simplicity, we will provide Ceph storage on all worker nodes.
Further configuration to specify which nodes to consider for storage discovery can be done in the Ceph Cluster CRD.
To create the raw partition on the worker nodes, they should have the following cloud-init config:
#cloud-config bootcmd: - [ cloud-init-per, once, move-second-header, sgdisk, --move-second-header, /dev/vda ] - [ cloud-init-per, once, create-ceph-part, parted, --script, /dev/vda, 'mkpart 2 50GB -1' ]
In case you struggle to understand the commands above, fear not! We explained partitioning via cloud-init in our previous blog post.
It will create a raw partition with start at 50GB, stretching until next partition or end of disk.
Change start from 50GB to something larger if you want to reserve more space for other partitions, such as the root partition.
Verify that the nodes have an empty partition:
# On the worker nodes $ sudo parted -l Model: Virtio Block Device (virtblk) Disk /dev/vda: 107GB Sector size (logical/physical): 512B/512B Partition Table: gpt Disk Flags: Number Start End Size File system Name Flags 14 1049kB 5243kB 4194kB bios_grub 15 5243kB 116MB 111MB fat32 boot, esp 1 116MB 50.0GB 49.9GB ext4 2 50.0GB 107GB 57.4GB 2
See partition 2. It does not have a file system, and is leaving ~50GB for other partitions, as desired from our cloud-init configuration.
Now that the nodes are prepared, it is time to deploy the Kubernetes cluster.
The Kubernetes installation will not be covered in detail.
We will install a vanilla Kubernetes cluster using Kubespray, but most installers should do.
Now that the worker nodes are prepared with a raw partition and Kubernetes is deployed, it is time to deploy the Rook Operator.
At this point it is totally fine to follow the Ceph Quickstart, but we will use the Ceph Operator Helm chart instead.
The examples are based on using Helm v3.
Create the namespace for Rook and deploy the Helm chart:
helm repo add rook-release https://charts.rook.io/release kubectl create namespace rook-ceph helm install rook-ceph rook-release/rook-ceph \ --namespace rook-ceph \ --version v1.5.3
See that the operator pod starts successfully:
$ kubectl --namespace rook-ceph get pods NAME READY STATUS RESTARTS AGE rook-ceph-operator-664d8997f-lttxz 1/1 Running 0 33s
The Rook repository provides some example manifests for Ceph clusters and StorageClasses.
In this case, we will deploy the sample production Ceph cluster
Note that this requires at least three worker nodes – if you have fewer nodes in your cluster, use
cluster-test.yaml (NOT RECOMMENDED FOR PRODUCTION).
Deploy the Ceph cluster and the storage class:
kubectl --namespace rook-ceph apply -f cluster.yaml kubectl --namespace rook-ceph apply -f storageclass.yaml
Give the Ceph cluster a few minutes to get ready.
You can check the status of the cluster by running:
$ kubectl --namespace rook-ceph get cephclusters.ceph.rook.io NAME ... PHASE MESSAGE HEALTH rook-ceph ... Ready Cluster created successfully HEALTH_OK
At this stage, we had an issue where the Ceph cluster failed to reach a healthy state.
After some debugging we came to the conclusion that our issue was caused by an incomplete deployment of cert-manager, causing the Kubernetes API server to be unable to respond to requests from the Rook Operator.
Make sure your Kubernetes clusters are in a healthy state before deploying Rook!
Creating and consuming a PersistentVolumeClaim
Once the Ceph cluster is ready, we can create the sample PersistentVolumeClaim (PVC) and see that Rook Ceph creates a PersistentVolume (PV) for it to bind to:
$ kubectl create -f pvc.yaml persistentvolumeclaim/rbd-pvc created $ kubectl get pvc NAME STATUS VOLUME CAPACITY ACCESS MODES STORAGECLASS AGE rbd-pvc Bound pvc-c3af7bd1-d277-4475-8e03-d87beb719e75 1Gi RWO rook-ceph-block 113s
and finally consume the PVC with a Pod:
$ kubectl create -f pod.yaml pod/csirbd-demo-pod created $ kubectl get pods NAME READY STATUS RESTARTS AGE csirbd-demo-pod 1/1 Running 0 21s
By describing the pod, we see that it is using the PVC:
$ kubectl describe pod csirbd-demo-pod Containers: web-server: ... Mounts: /var/lib/www/html from mypvc (rw) ... Volumes: mypvc: Type: PersistentVolumeClaim (a reference to a PersistentVolumeClaim in the same namespace) ClaimName: rbd-pvc ReadOnly: false ...
These manifests are only sample manifests, and not guaranteed to fit your infrastructure perfectly.
Look through the different storage types, the Ceph cluster configuration and the storage class configuration to see what would fit your use case best.
Rook Ceph comes with Prometheus support.
This requires Prometheus Operator to be deployed, which we will not cover here.
Once Prometheus Operator is installed, monitoring can be enabled per Rook Ceph cluster by setting
spec.monitoring.enabled=true in the CephCluster CR (
cluster.yaml in our example).
The manifest can be safely reapplied after changing this value, and the Rook Operator will create the corresponding ServiceMonitor.
Since Rook Ceph expects raw devices on the nodes it runs on, redeploying a cluster is not entirely straightforward (unless you can throw away and recreate the worker nodes).
For more detail, see the Rook Ceph Cleanup documentation.
Read more of our engineering blog posts
This blog post is part of our engineering blog post series. Experience and expertise, straight from our engineering team. Always with a focus on technical, hands-on HOWTO content with copy-pasteable code or CLI commands.
Would you like to read more content like this? Click the button below and see the other blog posts in this series!
Want to keep up with the latest in cloud and Kubernetes?
Let us deliver it straight to your inbox!