On-prem k8s storage: Rook vs Longhorn

Published: Aug 31, 2022 by Isaac Johnson

Rook and Longhorn are two CNCF backed projects for providing storage to Kubernetes. Rook is a way to add storage via Ceph or NFS in a Kubernetes cluster. Longhorn similarly is a storage class provider but it focuses on providing distributed block storage replicated across a cluster.

We’ll try and setup both and see how they compare.

Rook : Attempt Numbero Uno

In this attempt, I’ll use a two-node cluster with SSD drives and one attached formatted USB drive mounted with NFS

Before we setup Rook, let’s check our cluster configuration

I can see in my cluster, presently, I just have a 500Gb SSD drive available

$ sudo fdisk -l  | grep Disk | grep -v loop
Disk /dev/sda: 465.94 GiB, 500277790720 bytes, 977105060 sectors
Disk model: APPLE SSD SM512E
Disklabel type: gpt
Disk identifier: 35568E63-232C-4A36-97AB-F9022D0E462B

I add a thumb drive that was available

$ sudo fdisk -l  | grep Disk | grep -v loop
Disk /dev/sda: 465.94 GiB, 500277790720 bytes, 977105060 sectors
Disk model: APPLE SSD SM512E
Disklabel type: gpt
Disk identifier: 35568E63-232C-4A36-97AB-F9022D0E462B
Disk /dev/sdc: 29.3 GiB, 31457280000 bytes, 61440000 sectors
Disk model: USB DISK 2.0
Disklabel type: dos
Disk identifier: 0x6e52f796

This is now ‘/dev/sdc’. However, it is not mounted

$ df -h | grep /dev | grep -v loop
udev            3.8G     0  3.8G   0% /dev
/dev/sda2       458G   27G  408G   7% /
tmpfs           3.9G     0  3.9G   0% /dev/shm
/dev/sda1       511M  5.3M  506M   2% /boot/efi

Format and mount

Wipe then format as ext4

$ sudo wipefs -a /dev/sdc
[sudo] password for builder:
/dev/sdc: 2 bytes were erased at offset 0x000001fe (dos): 55 aa
/dev/sdc: calling ioctl to re-read partition table: Success

# Create a partition
builder@anna-MacBookAir:~$ sudo fdisk /dev/sdc

Welcome to fdisk (util-linux 2.34).
Changes will remain in memory only, until you decide to write them.
Be careful before using the write command.

The old ext4 signature will be removed by a write command.

Device does not contain a recognized partition table.
Created a new DOS disklabel with disk identifier 0x0c172cd2.

Command (m for help): n
Partition type
   p   primary (0 primary, 0 extended, 4 free)
   e   extended (container for logical partitions)
Select (default p): p
Partition number (1-4, default 1): 1
First sector (2048-61439999, default 2048):
Last sector, +/-sectors or +/-size{K,M,G,T,P} (2048-61439999, default 61439999):

Created a new partition 1 of type 'Linux' and of size 29.3 GiB.

Command (m for help): p
Disk /dev/sdc: 29.3 GiB, 31457280000 bytes, 61440000 sectors
Disk model: USB DISK 2.0
Units: sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 512 bytes / 512 bytes
Disklabel type: dos
Disk identifier: 0x0c172cd2

Device     Boot Start      End  Sectors  Size Id Type
/dev/sdc1        2048 61439999 61437952 29.3G 83 Linux

Command (m for help): w
The partition table has been altered.
Calling ioctl() to re-read partition table.
Syncing disks.

# this takes a while
~$ sudo mkfs.ext4 /dev/sdc1
mke2fs 1.45.5 (07-Jan-2020)
Creating filesystem with 7679744 4k blocks and 1921360 inodes
Filesystem UUID: f1c4e426-8e9a-4dde-88fa-412abd73b73b
Superblock backups stored on blocks:
        32768, 98304, 163840, 229376, 294912, 819200, 884736, 1605632, 2654208,
        4096000

Allocating group tables: done
Writing inode tables: done
Creating journal (32768 blocks): done
Writing superblocks and filesystem accounting information: done

Now we can mount this so we could use with Rook

Install Rook with Helm

Let’s first install the Rook Operator with helm

$ helm install --create-namespace --namespace rook-ceph rook-ceph rook-release/rook-ceph
W0825 11:52:12.866453    4930 warnings.go:70] policy/v1beta1 PodSecurityPolicy is deprecated in v1.21+, unavailable in v1.25+
W0825 11:52:13.548116    4930 warnings.go:70] policy/v1beta1 PodSecurityPolicy is deprecated in v1.21+, unavailable in v1.25+
NAME: rook-ceph
LAST DEPLOYED: Thu Aug 25 11:52:12 2022
NAMESPACE: rook-ceph
STATUS: deployed
REVISION: 1
TEST SUITE: None
NOTES:
The Rook Operator has been installed. Check its status by running:
  kubectl --namespace rook-ceph get pods -l "app=rook-ceph-operator"

Visit https://rook.io/docs/rook/latest for instructions on how to create and configure Rook clusters

Important Notes:
- You must customize the 'CephCluster' resource in the sample manifests for your cluster.
- Each CephCluster must be deployed to its own namespace, the samples use `rook-ceph` for the namespace.
- The sample manifests assume you also installed the rook-ceph operator in the `rook-ceph` namespace.
- The helm chart includes all the RBAC required to create a CephCluster CRD in the same namespace.
- Any disk devices you add to the cluster in the 'CephCluster' must be empty (no filesystem and no partitions).


$ helm list --all-namespaces
NAME            NAMESPACE       REVISION        UPDATED                                 STATUS          CHART                   APP VERSION
rook-ceph       rook-ceph       1               2022-08-25 11:52:12.462213486 -0500 CDT deployed        rook-ceph-v1.9.9        v1.9.9
traefik         kube-system     1               2022-08-25 14:55:37.746738083 +0000 UTC deployed        traefik-10.19.300       2.6.2
traefik-crd     kube-system     1               2022-08-25 14:55:34.323550464 +0000 UTC deployed        traefik-crd-10.19.300

Now install the Cluster helm chart

$ helm install --create-namespace --namespace rook-ceph rook-ceph-cluster --set operatorNamespace=rook-ceph rook-release/rook-ceph-cluster
NAME: rook-ceph-cluster
LAST DEPLOYED: Thu Aug 25 11:55:39 2022
NAMESPACE: rook-ceph
STATUS: deployed
REVISION: 1
TEST SUITE: None
NOTES:
The Ceph Cluster has been installed. Check its status by running:
  kubectl --namespace rook-ceph get cephcluster

Visit https://rook.io/docs/rook/latest/CRDs/ceph-cluster-crd/ for more information about the Ceph CRD.

Important Notes:
- You can only deploy a single cluster per namespace
- If you wish to delete this cluster and start fresh, you will also have to wipe the OSD disks using `sfdisk`

The Rook setup starts and stops some container at initial startup

$ kubectl get pods -n rook-ceph
NAME                                           READY   STATUS              RESTARTS   AGE
rook-ceph-operator-948ff69b5-v7gqm             1/1     Running             0          5m3s
csi-rbdplugin-lj2qx                            0/2     ContainerCreating   0          20s
csi-rbdplugin-jkq8m                            0/2     ContainerCreating   0          20s
csi-rbdplugin-provisioner-7bcc69755c-ws84s     0/5     ContainerCreating   0          20s
csi-rbdplugin-provisioner-7bcc69755c-6gpdw     0/5     ContainerCreating   0          20s
csi-cephfsplugin-nh984                         0/2     ContainerCreating   0          20s
csi-cephfsplugin-4c6j8                         0/2     ContainerCreating   0          20s
csi-cephfsplugin-provisioner-8556f8746-nz75b   0/5     ContainerCreating   0          20s
csi-cephfsplugin-provisioner-8556f8746-4sbk7   0/5     ContainerCreating   0          20s
rook-ceph-csi-detect-version-s85pg             1/1     Terminating         0          96s

$ kubectl get pods -n rook-ceph
NAME                                           READY   STATUS    RESTARTS   AGE
rook-ceph-operator-948ff69b5-v7gqm             1/1     Running   0          5m34s
csi-cephfsplugin-4c6j8                         2/2     Running   0          51s
csi-rbdplugin-jkq8m                            2/2     Running   0          51s
csi-cephfsplugin-nh984                         2/2     Running   0          51s
csi-cephfsplugin-provisioner-8556f8746-4sbk7   5/5     Running   0          51s
csi-rbdplugin-provisioner-7bcc69755c-ws84s     5/5     Running   0          51s
csi-rbdplugin-lj2qx                            2/2     Running   0          51s
csi-cephfsplugin-provisioner-8556f8746-nz75b   5/5     Running   0          51s
csi-rbdplugin-provisioner-7bcc69755c-6gpdw     5/5     Running   0          51s

Now we can see the new StorageClass

$ kubectl get sc
NAME                   PROVISIONER                     RECLAIMPOLICY   VOLUMEBINDINGMODE      ALLOWVOLUMEEXPANSION   AGE
local-path (default)   rancher.io/local-path           Delete          WaitForFirstConsumer   false                  123m
ceph-bucket            rook-ceph.ceph.rook.io/bucket   Delete          Immediate              false                  2m37s
ceph-filesystem        rook-ceph.cephfs.csi.ceph.com   Delete          Immediate              true                   2m37s
ceph-block (default)   rook-ceph.rbd.csi.ceph.com      Delete          Immediate              true                   2m37s

Testing

Let’s use this with MongoDB

I’ll first create a 5Gb PVC using the Rook ceph-block storage class

$ cat testingRook.yaml
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
  name: mongo-pvc
spec:
  storageClassName: ceph-block
  accessModes:
  - ReadWriteOnce
  resources:
    requests:
      storage: 5Gi
$ kubectl apply -f testingRook.yaml
persistentvolumeclaim/mongo-pvc created
$ kubectl get pvc
NAME        STATUS    VOLUME   CAPACITY   ACCESS MODES   STORAGECLASS   AGE
mongo-pvc   Pending                                      ceph-block     32s

This failed as I only had 2 nodes (requires 3)

Attemping pivot and redo with YAML

I’ll try and redo this using YAML

$ kubectl delete -f testingRook.yaml
persistentvolumeclaim "mongo-pvc" deleted
$ helm delete rook-ceph-cluster -n rook-ceph
release "rook-ceph-cluster" uninstalled

I’ll get the example code

builder@DESKTOP-QADGF36:~/Workspaces$ git clone --single-branch --branch master https://github.com/rook/rook.git
Cloning into 'rook'...
remote: Enumerating objects: 81890, done.
remote: Counting objects: 100% (278/278), done.
remote: Compressing objects: 100% (181/181), done.
remote: Total 81890 (delta 134), reused 207 (delta 95), pack-reused 81612
Receiving objects: 100% (81890/81890), 45.39 MiB | 27.43 MiB/s, done.
Resolving deltas: 100% (57204/57204), done.

Then attempt to use the “Test” cluster

builder@DESKTOP-QADGF36:~/Workspaces/rook/deploy/examples$ kubectl apply -f cluster-test.yaml
configmap/rook-config-override created
cephcluster.ceph.rook.io/my-cluster created
cephblockpool.ceph.rook.io/builtin-mgr created

And try and remove the former with helm

builder@DESKTOP-QADGF36:~/Workspaces/rook/deploy/examples$ helm delete rook-ceph -n rook-ceph
W0825 12:13:40.446278    6088 warnings.go:70] policy/v1beta1 PodSecurityPolicy is deprecated in v1.21+, unavailable in v1.25+
These resources were kept due to the resource policy:
[CustomResourceDefinition] cephblockpoolradosnamespaces.ceph.rook.io
[CustomResourceDefinition] cephblockpools.ceph.rook.io
[CustomResourceDefinition] cephbucketnotifications.ceph.rook.io
[CustomResourceDefinition] cephbuckettopics.ceph.rook.io
[CustomResourceDefinition] cephclients.ceph.rook.io
[CustomResourceDefinition] cephclusters.ceph.rook.io
[CustomResourceDefinition] cephfilesystemmirrors.ceph.rook.io
[CustomResourceDefinition] cephfilesystems.ceph.rook.io
[CustomResourceDefinition] cephfilesystemsubvolumegroups.ceph.rook.io
[CustomResourceDefinition] cephnfses.ceph.rook.io
[CustomResourceDefinition] cephobjectrealms.ceph.rook.io
[CustomResourceDefinition] cephobjectstores.ceph.rook.io
[CustomResourceDefinition] cephobjectstoreusers.ceph.rook.io
[CustomResourceDefinition] cephobjectzonegroups.ceph.rook.io
[CustomResourceDefinition] cephobjectzones.ceph.rook.io
[CustomResourceDefinition] cephrbdmirrors.ceph.rook.io
[CustomResourceDefinition] objectbucketclaims.objectbucket.io
[CustomResourceDefinition] objectbuckets.objectbucket.io

release "rook-ceph" uninstalled

Now do the CRDs and Common with YAMLs

builder@DESKTOP-QADGF36:~/Workspaces/rook/deploy/examples$ kubectl apply -f crds.yaml -f common.yaml
Warning: resource customresourcedefinitions/cephblockpoolradosnamespaces.ceph.rook.io is missing the kubectl.kubernetes.io/last-applied-configuration annotation which is required by kubectl apply. kubectl apply should only be used on resources created declaratively by either kubectl create --save-config or kubectl apply. The missing annotation will be patched automatically.
customresourcedefinition.apiextensions.k8s.io/cephblockpoolradosnamespaces.ceph.rook.io configured
Warning: resource customresourcedefinitions/cephblockpools.ceph.rook.io is missing the kubectl.kubernetes.io/last-applied-configuration annotation which is required by kubectl apply. kubectl apply should only be used on resources created declaratively by either kubectl create --save-config or kubectl apply. The missing annotation will be patched automatically.
customresourcedefinition.apiextensions.k8s.io/cephblockpools.ceph.rook.io configured
Warning: resource customresourcedefinitions/cephbucketnotifications.ceph.rook.io is missing the kubectl.kubernetes.io/last-applied-configuration annotation which is required by kubectl apply. kubectl apply should only be used on resources created declaratively by either kubectl create --save-config or kubectl apply. The missing annotation will be patched automatically.
customresourcedefinition.apiextensions.k8s.io/cephbucketnotifications.ceph.rook.io configured
Warning: resource customresourcedefinitions/cephbuckettopics.ceph.rook.io is missing the kubectl.kubernetes.io/last-applied-configuration annotation which is required by kubectl apply. kubectl apply should only be used on resources created declaratively by either kubectl create --save-config or kubectl apply. The missing annotation will be patched automatically.
customresourcedefinition.apiextensions.k8s.io/cephbuckettopics.ceph.rook.io configured
Warning: resource customresourcedefinitions/cephclients.ceph.rook.io is missing the kubectl.kubernetes.io/last-applied-configuration annotation which is required by kubectl apply. kubectl apply should only be used on resources created declaratively by either kubectl create --save-config or kubectl apply. The missing annotation will be patched automatically.
customresourcedefinition.apiextensions.k8s.io/cephclients.ceph.rook.io configured
Warning: resource customresourcedefinitions/cephclusters.ceph.rook.io is missing the kubectl.kubernetes.io/last-applied-configuration annotation which is required by kubectl apply. kubectl apply should only be used on resources created declaratively by either kubectl create --save-config or kubectl apply. The missing annotation will be patched automatically.
customresourcedefinition.apiextensions.k8s.io/cephclusters.ceph.rook.io configured
Warning: resource customresourcedefinitions/cephfilesystemmirrors.ceph.rook.io is missing the kubectl.kubernetes.io/last-applied-configuration annotation which is required by kubectl apply. kubectl apply should only be used on resources created declaratively by either kubectl create --save-config or kubectl apply. The missing annotation will be patched automatically.
customresourcedefinition.apiextensions.k8s.io/cephfilesystemmirrors.ceph.rook.io configured
Warning: resource customresourcedefinitions/cephfilesystems.ceph.rook.io is missing the kubectl.kubernetes.io/last-applied-configuration annotation which is required by kubectl apply. kubectl apply should only be used on resources created declaratively by either kubectl create --save-config or kubectl apply. The missing annotation will be patched automatically.
customresourcedefinition.apiextensions.k8s.io/cephfilesystems.ceph.rook.io configured
Warning: resource customresourcedefinitions/cephfilesystemsubvolumegroups.ceph.rook.io is missing the kubectl.kubernetes.io/last-applied-configuration annotation which is required by kubectl apply. kubectl apply should only be used on resources created declaratively by either kubectl create --save-config or kubectl apply. The missing annotation will be patched automatically.
customresourcedefinition.apiextensions.k8s.io/cephfilesystemsubvolumegroups.ceph.rook.io configured
Warning: resource customresourcedefinitions/cephnfses.ceph.rook.io is missing the kubectl.kubernetes.io/last-applied-configuration annotation which is required by kubectl apply. kubectl apply should only be used on resources created declaratively by either kubectl create --save-config or kubectl apply. The missing annotation will be patched automatically.
customresourcedefinition.apiextensions.k8s.io/cephnfses.ceph.rook.io configured
Warning: resource customresourcedefinitions/cephobjectrealms.ceph.rook.io is missing the kubectl.kubernetes.io/last-applied-configuration annotation which is required by kubectl apply. kubectl apply should only be used on resources created declaratively by either kubectl create --save-config or kubectl apply. The missing annotation will be patched automatically.
customresourcedefinition.apiextensions.k8s.io/cephobjectrealms.ceph.rook.io configured
Warning: resource customresourcedefinitions/cephobjectstores.ceph.rook.io is missing the kubectl.kubernetes.io/last-applied-configuration annotation which is required by kubectl apply. kubectl apply should only be used on resources created declaratively by either kubectl create --save-config or kubectl apply. The missing annotation will be patched automatically.
customresourcedefinition.apiextensions.k8s.io/cephobjectstores.ceph.rook.io configured
Warning: resource customresourcedefinitions/cephobjectstoreusers.ceph.rook.io is missing the kubectl.kubernetes.io/last-applied-configuration annotation which is required by kubectl apply. kubectl apply should only be used on resources created declaratively by either kubectl create --save-config or kubectl apply. The missing annotation will be patched automatically.
customresourcedefinition.apiextensions.k8s.io/cephobjectstoreusers.ceph.rook.io configured
Warning: resource customresourcedefinitions/cephobjectzonegroups.ceph.rook.io is missing the kubectl.kubernetes.io/last-applied-configuration annotation which is required by kubectl apply. kubectl apply should only be used on resources created declaratively by either kubectl create --save-config or kubectl apply. The missing annotation will be patched automatically.
customresourcedefinition.apiextensions.k8s.io/cephobjectzonegroups.ceph.rook.io configured
Warning: resource customresourcedefinitions/cephobjectzones.ceph.rook.io is missing the kubectl.kubernetes.io/last-applied-configuration annotation which is required by kubectl apply. kubectl apply should only be used on resources created declaratively by either kubectl create --save-config or kubectl apply. The missing annotation will be patched automatically.
customresourcedefinition.apiextensions.k8s.io/cephobjectzones.ceph.rook.io configured
Warning: resource customresourcedefinitions/cephrbdmirrors.ceph.rook.io is missing the kubectl.kubernetes.io/last-applied-configuration annotation which is required by kubectl apply. kubectl apply should only be used on resources created declaratively by either kubectl create --save-config or kubectl apply. The missing annotation will be patched automatically.
customresourcedefinition.apiextensions.k8s.io/cephrbdmirrors.ceph.rook.io configured
Warning: resource customresourcedefinitions/objectbucketclaims.objectbucket.io is missing the kubectl.kubernetes.io/last-applied-configuration annotation which is required by kubectl apply. kubectl apply should only be used on resources created declaratively by either kubectl create --save-config or kubectl apply. The missing annotation will be patched automatically.
customresourcedefinition.apiextensions.k8s.io/objectbucketclaims.objectbucket.io configured
Warning: resource customresourcedefinitions/objectbuckets.objectbucket.io is missing the kubectl.kubernetes.io/last-applied-configuration annotation which is required by kubectl apply. kubectl apply should only be used on resources created declaratively by either kubectl create --save-config or kubectl apply. The missing annotation will be patched automatically.
customresourcedefinition.apiextensions.k8s.io/objectbuckets.objectbucket.io configured
Warning: resource namespaces/rook-ceph is missing the kubectl.kubernetes.io/last-applied-configuration annotation which is required by kubectl apply. kubectl apply should only be used on resources created declaratively by either kubectl create --save-config or kubectl apply. The missing annotation will be patched automatically.
namespace/rook-ceph configured
clusterrole.rbac.authorization.k8s.io/cephfs-csi-nodeplugin created
clusterrole.rbac.authorization.k8s.io/cephfs-external-provisioner-runner created
clusterrole.rbac.authorization.k8s.io/psp:rook created
clusterrole.rbac.authorization.k8s.io/rbd-csi-nodeplugin created
clusterrole.rbac.authorization.k8s.io/rbd-external-provisioner-runner created
clusterrole.rbac.authorization.k8s.io/rook-ceph-cluster-mgmt created
clusterrole.rbac.authorization.k8s.io/rook-ceph-global created
clusterrole.rbac.authorization.k8s.io/rook-ceph-mgr-cluster created
clusterrole.rbac.authorization.k8s.io/rook-ceph-mgr-system created
clusterrole.rbac.authorization.k8s.io/rook-ceph-object-bucket created
clusterrole.rbac.authorization.k8s.io/rook-ceph-osd created
clusterrole.rbac.authorization.k8s.io/rook-ceph-system created
clusterrolebinding.rbac.authorization.k8s.io/cephfs-csi-provisioner-role created
clusterrolebinding.rbac.authorization.k8s.io/rbd-csi-nodeplugin created
clusterrolebinding.rbac.authorization.k8s.io/rbd-csi-provisioner-role created
clusterrolebinding.rbac.authorization.k8s.io/rook-ceph-global created
clusterrolebinding.rbac.authorization.k8s.io/rook-ceph-mgr-cluster created
clusterrolebinding.rbac.authorization.k8s.io/rook-ceph-object-bucket created
clusterrolebinding.rbac.authorization.k8s.io/rook-ceph-osd created
clusterrolebinding.rbac.authorization.k8s.io/rook-ceph-system created
clusterrolebinding.rbac.authorization.k8s.io/rook-ceph-system-psp created
clusterrolebinding.rbac.authorization.k8s.io/rook-csi-cephfs-plugin-sa-psp created
clusterrolebinding.rbac.authorization.k8s.io/rook-csi-cephfs-provisioner-sa-psp created
clusterrolebinding.rbac.authorization.k8s.io/rook-csi-rbd-plugin-sa-psp created
clusterrolebinding.rbac.authorization.k8s.io/rook-csi-rbd-provisioner-sa-psp created
Warning: policy/v1beta1 PodSecurityPolicy is deprecated in v1.21+, unavailable in v1.25+
podsecuritypolicy.policy/00-rook-privileged created
role.rbac.authorization.k8s.io/cephfs-external-provisioner-cfg created
role.rbac.authorization.k8s.io/rbd-csi-nodeplugin created
role.rbac.authorization.k8s.io/rbd-external-provisioner-cfg created
role.rbac.authorization.k8s.io/rook-ceph-cmd-reporter created
role.rbac.authorization.k8s.io/rook-ceph-mgr created
role.rbac.authorization.k8s.io/rook-ceph-osd created
role.rbac.authorization.k8s.io/rook-ceph-purge-osd created
role.rbac.authorization.k8s.io/rook-ceph-rgw created
role.rbac.authorization.k8s.io/rook-ceph-system created
rolebinding.rbac.authorization.k8s.io/cephfs-csi-provisioner-role-cfg created
rolebinding.rbac.authorization.k8s.io/rbd-csi-nodeplugin-role-cfg created
rolebinding.rbac.authorization.k8s.io/rbd-csi-provisioner-role-cfg created
rolebinding.rbac.authorization.k8s.io/rook-ceph-cluster-mgmt created
rolebinding.rbac.authorization.k8s.io/rook-ceph-cmd-reporter created
rolebinding.rbac.authorization.k8s.io/rook-ceph-cmd-reporter-psp created
rolebinding.rbac.authorization.k8s.io/rook-ceph-default-psp created
rolebinding.rbac.authorization.k8s.io/rook-ceph-mgr created
rolebinding.rbac.authorization.k8s.io/rook-ceph-mgr-psp created
rolebinding.rbac.authorization.k8s.io/rook-ceph-mgr-system created
rolebinding.rbac.authorization.k8s.io/rook-ceph-osd created
rolebinding.rbac.authorization.k8s.io/rook-ceph-osd-psp created
rolebinding.rbac.authorization.k8s.io/rook-ceph-purge-osd created
rolebinding.rbac.authorization.k8s.io/rook-ceph-purge-osd-psp created
rolebinding.rbac.authorization.k8s.io/rook-ceph-rgw created
rolebinding.rbac.authorization.k8s.io/rook-ceph-rgw-psp created
rolebinding.rbac.authorization.k8s.io/rook-ceph-system created
serviceaccount/rook-ceph-cmd-reporter created
serviceaccount/rook-ceph-mgr created
serviceaccount/rook-ceph-osd created
serviceaccount/rook-ceph-purge-osd created
serviceaccount/rook-ceph-rgw created
serviceaccount/rook-ceph-system created
serviceaccount/rook-csi-cephfs-plugin-sa created
serviceaccount/rook-csi-cephfs-provisioner-sa created
serviceaccount/rook-csi-rbd-plugin-sa created
serviceaccount/rook-csi-rbd-provisioner-sa created

Then the Operator

builder@DESKTOP-QADGF36:~/Workspaces/rook/deploy/examples$ kubectl apply -f operator.yaml
configmap/rook-ceph-operator-config created
deployment.apps/rook-ceph-operator created

We wait for the operator to come up

$ kubectl get pods -n rook-ceph
NAME                                 READY   STATUS              RESTARTS   AGE
rook-ceph-operator-b5c96c99b-9qzjq   0/1     ContainerCreating   0          28s
$ kubectl get pods -n rook-ceph
NAME                                 READY   STATUS    RESTARTS   AGE
rook-ceph-operator-b5c96c99b-9qzjq   1/1     Running   0          89s

Now let’s launch the test cluster

# delete the former
builder@DESKTOP-QADGF36:~/Workspaces/rook/deploy/examples$ kubectl delete -f cluster-test.yaml
configmap "rook-config-override" deleted
cephcluster.ceph.rook.io "my-cluster" deleted
cephblockpool.ceph.rook.io "builtin-mgr" deleted

I can see they are stuck

$ kubectl describe cephblockpool.ceph.rook.io/builtin-mgr -n rook-ceph | tail -n5
Events:
  Type    Reason              Age                   From                             Message
  ----    ------              ----                  ----                             -------
  Normal  ReconcileSucceeded  7m58s (x14 over 10m)  rook-ceph-block-pool-controller  successfully configured CephBlockPool "rook-ceph/builtin-mgr"
  Normal  ReconcileSucceeded  7s (x32 over 5m7s)    rook-ceph-block-pool-controller  successfully configured CephBlockPool "rook-ceph/builtin-mgr"

Add/Confirm LVM2, which is a pre-req for Rook, to both nodes

builder@anna-MacBookAir:~$ sudo apt-get install lvm2
[sudo] password for builder:
Reading package lists... Done
Building dependency tree
Reading state information... Done
The following packages were automatically installed and are no longer required:
  libfprint-2-tod1 libfwupdplugin1 libllvm10 libllvm11 shim
Use 'sudo apt autoremove' to remove them.
The following additional packages will be installed:
  dmeventd libaio1 libdevmapper-event1.02.1 liblvm2cmd2.03 libreadline5 thin-provisioning-tools
The following NEW packages will be installed:
  dmeventd libaio1 libdevmapper-event1.02.1 liblvm2cmd2.03 libreadline5 lvm2 thin-provisioning-tools
0 upgraded, 7 newly installed, 0 to remove and 1 not upgraded.
Need to get 2,255 kB of archives.
After this operation, 8,919 kB of additional disk space will be used.
Do you want to continue? [Y/n] y
...

Download rook

$ git clone --single-branch --branch release-1.9 https://github.com/rook/rook.git
Cloning into 'rook'...
remote: Enumerating objects: 80478, done.
remote: Counting objects: 100% (86/86), done.
remote: Compressing objects: 100% (75/75), done.
remote: Total 80478 (delta 13), reused 58 (delta 9), pack-reused 80392
Receiving objects: 100% (80478/80478), 44.82 MiB | 5.54 MiB/s, done.
Resolving deltas: 100% (56224/56224), done.

We now need the common

builder@DESKTOP-72D2D9T:~/Workspaces/rook/deploy/examples$ kubectl create -f common.yaml
namespace/rook-ceph created
Warning: policy/v1beta1 PodSecurityPolicy is deprecated in v1.21+, unavailable in v1.25+
role.rbac.authorization.k8s.io/cephfs-external-provisioner-cfg created
role.rbac.authorization.k8s.io/rbd-csi-nodeplugin created
role.rbac.authorization.k8s.io/rbd-external-provisioner-cfg created
role.rbac.authorization.k8s.io/rook-ceph-cmd-reporter created
role.rbac.authorization.k8s.io/rook-ceph-mgr created
role.rbac.authorization.k8s.io/rook-ceph-osd created
role.rbac.authorization.k8s.io/rook-ceph-purge-osd created
role.rbac.authorization.k8s.io/rook-ceph-rgw created
role.rbac.authorization.k8s.io/rook-ceph-system created
rolebinding.rbac.authorization.k8s.io/cephfs-csi-provisioner-role-cfg created
rolebinding.rbac.authorization.k8s.io/rbd-csi-nodeplugin-role-cfg created
rolebinding.rbac.authorization.k8s.io/rbd-csi-provisioner-role-cfg created
rolebinding.rbac.authorization.k8s.io/rook-ceph-cluster-mgmt created
rolebinding.rbac.authorization.k8s.io/rook-ceph-cmd-reporter created
rolebinding.rbac.authorization.k8s.io/rook-ceph-cmd-reporter-psp created
rolebinding.rbac.authorization.k8s.io/rook-ceph-default-psp created
rolebinding.rbac.authorization.k8s.io/rook-ceph-mgr created
rolebinding.rbac.authorization.k8s.io/rook-ceph-mgr-psp created
rolebinding.rbac.authorization.k8s.io/rook-ceph-mgr-system created
rolebinding.rbac.authorization.k8s.io/rook-ceph-osd created
rolebinding.rbac.authorization.k8s.io/rook-ceph-osd-psp created
rolebinding.rbac.authorization.k8s.io/rook-ceph-purge-osd created
rolebinding.rbac.authorization.k8s.io/rook-ceph-purge-osd-psp created
rolebinding.rbac.authorization.k8s.io/rook-ceph-rgw created
rolebinding.rbac.authorization.k8s.io/rook-ceph-rgw-psp created
rolebinding.rbac.authorization.k8s.io/rook-ceph-system created
serviceaccount/rook-ceph-cmd-reporter created
serviceaccount/rook-ceph-mgr created
serviceaccount/rook-ceph-osd created
serviceaccount/rook-ceph-purge-osd created
serviceaccount/rook-ceph-rgw created
serviceaccount/rook-ceph-system created
serviceaccount/rook-csi-cephfs-plugin-sa created
serviceaccount/rook-csi-cephfs-provisioner-sa created
serviceaccount/rook-csi-rbd-plugin-sa created
serviceaccount/rook-csi-rbd-provisioner-sa created
Error from server (AlreadyExists): error when creating "common.yaml": clusterroles.rbac.authorization.k8s.io "cephfs-csi-nodeplugin" already exists
Error from server (AlreadyExists): error when creating "common.yaml": clusterroles.rbac.authorization.k8s.io "cephfs-external-provisioner-runner" already exists
Error from server (AlreadyExists): error when creating "common.yaml": clusterroles.rbac.authorization.k8s.io "psp:rook" already exists
Error from server (AlreadyExists): error when creating "common.yaml": clusterroles.rbac.authorization.k8s.io "rbd-csi-nodeplugin" already exists
Error from server (AlreadyExists): error when creating "common.yaml": clusterroles.rbac.authorization.k8s.io "rbd-external-provisioner-runner" already exists
Error from server (AlreadyExists): error when creating "common.yaml": clusterroles.rbac.authorization.k8s.io "rook-ceph-cluster-mgmt" already exists
Error from server (AlreadyExists): error when creating "common.yaml": clusterroles.rbac.authorization.k8s.io "rook-ceph-global" already exists
Error from server (AlreadyExists): error when creating "common.yaml": clusterroles.rbac.authorization.k8s.io "rook-ceph-mgr-cluster" already exists
Error from server (AlreadyExists): error when creating "common.yaml": clusterroles.rbac.authorization.k8s.io "rook-ceph-mgr-system" already exists
Error from server (AlreadyExists): error when creating "common.yaml": clusterroles.rbac.authorization.k8s.io "rook-ceph-object-bucket" already exists
Error from server (AlreadyExists): error when creating "common.yaml": clusterroles.rbac.authorization.k8s.io "rook-ceph-osd" already exists
Error from server (AlreadyExists): error when creating "common.yaml": clusterroles.rbac.authorization.k8s.io "rook-ceph-system" already exists
Error from server (AlreadyExists): error when creating "common.yaml": clusterrolebindings.rbac.authorization.k8s.io "cephfs-csi-nodeplugin" already exists
Error from server (AlreadyExists): error when creating "common.yaml": clusterrolebindings.rbac.authorization.k8s.io "cephfs-csi-provisioner-role" already exists
Error from server (AlreadyExists): error when creating "common.yaml": clusterrolebindings.rbac.authorization.k8s.io "rbd-csi-nodeplugin" already exists
Error from server (AlreadyExists): error when creating "common.yaml": clusterrolebindings.rbac.authorization.k8s.io "rbd-csi-provisioner-role" already exists
Error from server (AlreadyExists): error when creating "common.yaml": clusterrolebindings.rbac.authorization.k8s.io "rook-ceph-global" already exists
Error from server (AlreadyExists): error when creating "common.yaml": clusterrolebindings.rbac.authorization.k8s.io "rook-ceph-mgr-cluster" already exists
Error from server (AlreadyExists): error when creating "common.yaml": clusterrolebindings.rbac.authorization.k8s.io "rook-ceph-object-bucket" already exists
Error from server (AlreadyExists): error when creating "common.yaml": clusterrolebindings.rbac.authorization.k8s.io "rook-ceph-osd" already exists
Error from server (AlreadyExists): error when creating "common.yaml": clusterrolebindings.rbac.authorization.k8s.io "rook-ceph-system" already exists
Error from server (AlreadyExists): error when creating "common.yaml": clusterrolebindings.rbac.authorization.k8s.io "rook-ceph-system-psp" already exists
Error from server (AlreadyExists): error when creating "common.yaml": clusterrolebindings.rbac.authorization.k8s.io "rook-csi-cephfs-plugin-sa-psp" already exists
Error from server (AlreadyExists): error when creating "common.yaml": clusterrolebindings.rbac.authorization.k8s.io "rook-csi-cephfs-provisioner-sa-psp" already exists
Error from server (AlreadyExists): error when creating "common.yaml": clusterrolebindings.rbac.authorization.k8s.io "rook-csi-rbd-plugin-sa-psp" already exists
Error from server (AlreadyExists): error when creating "common.yaml": clusterrolebindings.rbac.authorization.k8s.io "rook-csi-rbd-provisioner-sa-psp" already exists
Error from server (AlreadyExists): error when creating "common.yaml": podsecuritypolicies.policy "00-rook-privileged" already exists

I then tried the operator yet again

$ kubectl create -f operator.yaml
configmap/rook-ceph-operator-config created
deployment.apps/rook-ceph-operator created

But never did the Ceph cluster move onto new pods - that is, the cluster never came up

$ kubectl get pod -n rook-ceph
NAME                                  READY   STATUS    RESTARTS   AGE
rook-ceph-operator-5949bdbb59-clhwc   1/1     Running   0          21s

$ kubectl get pod -n rook-ceph
NAME                                  READY   STATUS    RESTARTS   AGE
rook-ceph-operator-5949bdbb59-dq6cj   1/1     Running   0          9h

Rook : Attempt Deux : Unformatted drive and YAML

I’ll summarize in saying that I tried, unsuccessfully, to remove the failed Rook-Ceph. The objects and finalizers created so many dependent locks I finally gave up and moved onto Longhorn.

Here I will use an unformatted thumb drive and similar to this guide use YAML files

Respin and try again

I respun the cluster fresh to try again

$ kubectl get nodes
NAME                  STATUS   ROLES                  AGE     VERSION
anna-macbookair       Ready    control-plane,master   7h57m   v1.24.4+k3s1
builder-macbookpro2   Ready    <none>                 7h55m   v1.24.4+k3s1

I double-checked on LVM on both the master and worker nodes

builder@anna-MacBookAir:~$ sudo apt-get install lvm2
[sudo] password for builder:
Reading package lists... Done
Building dependency tree
Reading state information... Done
lvm2 is already the newest version (2.03.07-1ubuntu1).
The following packages were automatically installed and are no longer required:
  libfprint-2-tod1 libfwupdplugin1 libllvm10 libllvm11 shim
Use 'sudo apt autoremove' to remove them.
0 upgraded, 0 newly installed, 0 to remove and 5 not upgraded.

builder@builder-MacBookPro2:~$ sudo apt-get install lvm2
[sudo] password for builder:
Reading package lists... Done
Building dependency tree
Reading state information... Done
lvm2 is already the newest version (2.03.07-1ubuntu1).
The following packages were automatically installed and are no longer required:
  libfprint-2-tod1 libllvm10
Use 'sudo apt autoremove' to remove them.
0 upgraded, 0 newly installed, 0 to remove and 6 not upgraded.

I popped in a USB Drive, however it already has a formatted partition

builder@anna-MacBookAir:~$ ls -ltra /dev | grep sd
brw-rw----   1 root disk      8,   0 Aug 28 22:07 sda
brw-rw----   1 root disk      8,   2 Aug 28 22:07 sda2
brw-rw----   1 root disk      8,   1 Aug 28 22:07 sda1
brw-rw----   1 root disk      8,  16 Aug 28 22:07 sdb
builder@anna-MacBookAir:~$ ls -ltra /dev | grep sd
brw-rw----   1 root disk      8,   0 Aug 28 22:07 sda
brw-rw----   1 root disk      8,   2 Aug 28 22:07 sda2
brw-rw----   1 root disk      8,   1 Aug 28 22:07 sda1
brw-rw----   1 root disk      8,  16 Aug 28 22:07 sdb
brw-rw----   1 root disk      8,  32 Aug 29 06:14 sdc
brw-rw----   1 root disk      8,  33 Aug 29 06:14 sdc1

I made sure it was unmounted and i commented out the fstab line that would use it with mount -a

builder@anna-MacBookAir:~$ cat /etc/fstab | tail -n1
# UUID=f1c4e426-8e9a-4dde-88fa-412abd73b73b     /mnt/thumbdrive ext4    defaults,errors=remount-ro 0       1
builder@anna-MacBookAir:~$ sudo umount /mnt/thumbdrive
umount: /mnt/thumbdrive: not mounted.

Now I need to ditch the ext4 filesystem and make a primary partition

builder@anna-MacBookAir:~$ sudo fdisk /dev/sdc

Welcome to fdisk (util-linux 2.34).
Changes will remain in memory only, until you decide to write them.
Be careful before using the write command.


Command (m for help): m

Help:

  DOS (MBR)
   a   toggle a bootable flag
   b   edit nested BSD disklabel
   c   toggle the dos compatibility flag

  Generic
   d   delete a partition
   F   list free unpartitioned space
   l   list known partition types
   n   add a new partition
   p   print the partition table
   t   change a partition type
   v   verify the partition table
   i   print information about a partition

  Misc
   m   print this menu
   u   change display/entry units
   x   extra functionality (experts only)

  Script
   I   load disk layout from sfdisk script file
   O   dump disk layout to sfdisk script file

  Save & Exit
   w   write table to disk and exit
   q   quit without saving changes

  Create a new label
   g   create a new empty GPT partition table
   G   create a new empty SGI (IRIX) partition table
   o   create a new empty DOS partition table
   s   create a new empty Sun partition table


Command (m for help): d
Selected partition 1
Partition 1 has been deleted.

Command (m for help): p
Disk /dev/sdc: 29.3 GiB, 31457280000 bytes, 61440000 sectors
Disk model: USB DISK 2.0
Units: sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 512 bytes / 512 bytes
Disklabel type: dos
Disk identifier: 0x0c172cd2

Command (m for help): n
Partition type
   p   primary (0 primary, 0 extended, 4 free)
   e   extended (container for logical partitions)
Select (default p): p
Partition number (1-4, default 1):
First sector (2048-61439999, default 2048):
Last sector, +/-sectors or +/-size{K,M,G,T,P} (2048-61439999, default 61439999):

Created a new partition 1 of type 'Linux' and of size 29.3 GiB.

Command (m for help): t
Selected partition 1
Hex code (type L to list all codes): L

 0  Empty           24  NEC DOS         81  Minix / old Lin bf  Solaris
 1  FAT12           27  Hidden NTFS Win 82  Linux swap / So c1  DRDOS/sec (FAT-
 2  XENIX root      39  Plan 9          83  Linux           c4  DRDOS/sec (FAT-
 3  XENIX usr       3c  PartitionMagic  84  OS/2 hidden or  c6  DRDOS/sec (FAT-
 4  FAT16 <32M      40  Venix 80286     85  Linux extended  c7  Syrinx
 5  Extended        41  PPC PReP Boot   86  NTFS volume set da  Non-FS data
 6  FAT16           42  SFS             87  NTFS volume set db  CP/M / CTOS / .
 7  HPFS/NTFS/exFAT 4d  QNX4.x          88  Linux plaintext de  Dell Utility
 8  AIX             4e  QNX4.x 2nd part 8e  Linux LVM       df  BootIt
 9  AIX bootable    4f  QNX4.x 3rd part 93  Amoeba          e1  DOS access
 a  OS/2 Boot Manag 50  OnTrack DM      94  Amoeba BBT      e3  DOS R/O
 b  W95 FAT32       51  OnTrack DM6 Aux 9f  BSD/OS          e4  SpeedStor
 c  W95 FAT32 (LBA) 52  CP/M            a0  IBM Thinkpad hi ea  Rufus alignment
 e  W95 FAT16 (LBA) 53  OnTrack DM6 Aux a5  FreeBSD         eb  BeOS fs
 f  W95 Ext'd (LBA) 54  OnTrackDM6      a6  OpenBSD         ee  GPT
10  OPUS            55  EZ-Drive        a7  NeXTSTEP        ef  EFI (FAT-12/16/
11  Hidden FAT12    56  Golden Bow      a8  Darwin UFS      f0  Linux/PA-RISC b
12  Compaq diagnost 5c  Priam Edisk     a9  NetBSD          f1  SpeedStor
14  Hidden FAT16 <3 61  SpeedStor       ab  Darwin boot     f4  SpeedStor
16  Hidden FAT16    63  GNU HURD or Sys af  HFS / HFS+      f2  DOS secondary
17  Hidden HPFS/NTF 64  Novell Netware  b7  BSDI fs         fb  VMware VMFS
18  AST SmartSleep  65  Novell Netware  b8  BSDI swap       fc  VMware VMKCORE
1b  Hidden W95 FAT3 70  DiskSecure Mult bb  Boot Wizard hid fd  Linux raid auto
1c  Hidden W95 FAT3 75  PC/IX           bc  Acronis FAT32 L fe  LANstep
1e  Hidden W95 FAT1 80  Old Minix       be  Solaris boot    ff  BBT
Hex code (type L to list all codes): 60
Changed type of partition 'Linux' to 'unknown'.

Command (m for help): w
The partition table has been altered.
Calling ioctl() to re-read partition table.

Note, “60” is not valid. which makes it RAW. We can see all types here

Last time we tried with Helm, here we will use YAML

builder@DESKTOP-QADGF36:~$ git clone --single-branch --branch master https://github.com/rook/rook.git
Cloning into 'rook'...
remote: Enumerating objects: 81967, done.
remote: Counting objects: 100% (355/355), done.
remote: Compressing objects: 100% (244/244), done.
remote: Total 81967 (delta 170), reused 247 (delta 109), pack-reused 81612
Receiving objects: 100% (81967/81967), 45.44 MiB | 37.20 MiB/s, done.
Resolving deltas: 100% (57240/57240), done.
builder@DESKTOP-QADGF36:~$ cd rook/deploy/examples/
builder@DESKTOP-QADGF36:~/rook/deploy/examples$

Create PSP

builder@DESKTOP-QADGF36:~/rook/deploy/examples$ kubectl create -f psp.yaml
clusterrole.rbac.authorization.k8s.io/psp:rook created
clusterrolebinding.rbac.authorization.k8s.io/rook-ceph-system-psp created
clusterrolebinding.rbac.authorization.k8s.io/rook-csi-cephfs-plugin-sa-psp created
clusterrolebinding.rbac.authorization.k8s.io/rook-csi-cephfs-provisioner-sa-psp created
clusterrolebinding.rbac.authorization.k8s.io/rook-csi-rbd-plugin-sa-psp created
clusterrolebinding.rbac.authorization.k8s.io/rook-csi-rbd-provisioner-sa-psp created
Warning: policy/v1beta1 PodSecurityPolicy is deprecated in v1.21+, unavailable in v1.25+
podsecuritypolicy.policy/00-rook-privileged created
Error from server (NotFound): error when creating "psp.yaml": namespaces "rook-ceph" not found
Error from server (NotFound): error when creating "psp.yaml": namespaces "rook-ceph" not found
Error from server (NotFound): error when creating "psp.yaml": namespaces "rook-ceph" not found
Error from server (NotFound): error when creating "psp.yaml": namespaces "rook-ceph" not found
Error from server (NotFound): error when creating "psp.yaml": namespaces "rook-ceph" not found
Error from server (NotFound): error when creating "psp.yaml": namespaces "rook-ceph" not found

Create CRDs, common and Operator

builder@DESKTOP-QADGF36:~/rook/deploy/examples$ kubectl create -f crds.yaml -f common.yaml -f operator.yaml
customresourcedefinition.apiextensions.k8s.io/cephblockpoolradosnamespaces.ceph.rook.io created
customresourcedefinition.apiextensions.k8s.io/cephblockpools.ceph.rook.io created
customresourcedefinition.apiextensions.k8s.io/cephbucketnotifications.ceph.rook.io created
customresourcedefinition.apiextensions.k8s.io/cephbuckettopics.ceph.rook.io created
customresourcedefinition.apiextensions.k8s.io/cephclients.ceph.rook.io created
customresourcedefinition.apiextensions.k8s.io/cephclusters.ceph.rook.io created
customresourcedefinition.apiextensions.k8s.io/cephfilesystemmirrors.ceph.rook.io created
customresourcedefinition.apiextensions.k8s.io/cephfilesystems.ceph.rook.io created
customresourcedefinition.apiextensions.k8s.io/cephfilesystemsubvolumegroups.ceph.rook.io created
customresourcedefinition.apiextensions.k8s.io/cephnfses.ceph.rook.io created
customresourcedefinition.apiextensions.k8s.io/cephobjectrealms.ceph.rook.io created
customresourcedefinition.apiextensions.k8s.io/cephobjectstores.ceph.rook.io created
customresourcedefinition.apiextensions.k8s.io/cephobjectstoreusers.ceph.rook.io created
customresourcedefinition.apiextensions.k8s.io/cephobjectzonegroups.ceph.rook.io created
customresourcedefinition.apiextensions.k8s.io/cephobjectzones.ceph.rook.io created
customresourcedefinition.apiextensions.k8s.io/cephrbdmirrors.ceph.rook.io created
customresourcedefinition.apiextensions.k8s.io/objectbucketclaims.objectbucket.io created
customresourcedefinition.apiextensions.k8s.io/objectbuckets.objectbucket.io created
namespace/rook-ceph created
clusterrole.rbac.authorization.k8s.io/cephfs-csi-nodeplugin created
clusterrole.rbac.authorization.k8s.io/cephfs-external-provisioner-runner created
clusterrole.rbac.authorization.k8s.io/rbd-csi-nodeplugin created
clusterrole.rbac.authorization.k8s.io/rbd-external-provisioner-runner created
clusterrole.rbac.authorization.k8s.io/rook-ceph-cluster-mgmt created
clusterrole.rbac.authorization.k8s.io/rook-ceph-global created
clusterrole.rbac.authorization.k8s.io/rook-ceph-mgr-cluster created
clusterrole.rbac.authorization.k8s.io/rook-ceph-mgr-system created
clusterrole.rbac.authorization.k8s.io/rook-ceph-object-bucket created
clusterrole.rbac.authorization.k8s.io/rook-ceph-osd created
clusterrole.rbac.authorization.k8s.io/rook-ceph-system created
clusterrolebinding.rbac.authorization.k8s.io/cephfs-csi-provisioner-role created
clusterrolebinding.rbac.authorization.k8s.io/rbd-csi-nodeplugin created
clusterrolebinding.rbac.authorization.k8s.io/rbd-csi-provisioner-role created
clusterrolebinding.rbac.authorization.k8s.io/rook-ceph-global created
clusterrolebinding.rbac.authorization.k8s.io/rook-ceph-mgr-cluster created
clusterrolebinding.rbac.authorization.k8s.io/rook-ceph-object-bucket created
clusterrolebinding.rbac.authorization.k8s.io/rook-ceph-osd created
clusterrolebinding.rbac.authorization.k8s.io/rook-ceph-system created
Warning: policy/v1beta1 PodSecurityPolicy is deprecated in v1.21+, unavailable in v1.25+
role.rbac.authorization.k8s.io/cephfs-external-provisioner-cfg created
role.rbac.authorization.k8s.io/rbd-csi-nodeplugin created
role.rbac.authorization.k8s.io/rbd-external-provisioner-cfg created
role.rbac.authorization.k8s.io/rook-ceph-cmd-reporter created
role.rbac.authorization.k8s.io/rook-ceph-mgr created
role.rbac.authorization.k8s.io/rook-ceph-osd created
role.rbac.authorization.k8s.io/rook-ceph-purge-osd created
role.rbac.authorization.k8s.io/rook-ceph-rgw created
role.rbac.authorization.k8s.io/rook-ceph-system created
rolebinding.rbac.authorization.k8s.io/cephfs-csi-provisioner-role-cfg created
rolebinding.rbac.authorization.k8s.io/rbd-csi-nodeplugin-role-cfg created
rolebinding.rbac.authorization.k8s.io/rbd-csi-provisioner-role-cfg created
rolebinding.rbac.authorization.k8s.io/rook-ceph-cluster-mgmt created
rolebinding.rbac.authorization.k8s.io/rook-ceph-cmd-reporter created
rolebinding.rbac.authorization.k8s.io/rook-ceph-cmd-reporter-psp created
rolebinding.rbac.authorization.k8s.io/rook-ceph-default-psp created
rolebinding.rbac.authorization.k8s.io/rook-ceph-mgr created
rolebinding.rbac.authorization.k8s.io/rook-ceph-mgr-psp created
rolebinding.rbac.authorization.k8s.io/rook-ceph-mgr-system created
rolebinding.rbac.authorization.k8s.io/rook-ceph-osd created
rolebinding.rbac.authorization.k8s.io/rook-ceph-osd-psp created
rolebinding.rbac.authorization.k8s.io/rook-ceph-purge-osd created
rolebinding.rbac.authorization.k8s.io/rook-ceph-purge-osd-psp created
rolebinding.rbac.authorization.k8s.io/rook-ceph-rgw created
rolebinding.rbac.authorization.k8s.io/rook-ceph-rgw-psp created
rolebinding.rbac.authorization.k8s.io/rook-ceph-system created
serviceaccount/rook-ceph-cmd-reporter created
serviceaccount/rook-ceph-mgr created
serviceaccount/rook-ceph-osd created
serviceaccount/rook-ceph-purge-osd created
serviceaccount/rook-ceph-rgw created
serviceaccount/rook-ceph-system created
serviceaccount/rook-csi-cephfs-plugin-sa created
serviceaccount/rook-csi-cephfs-provisioner-sa created
serviceaccount/rook-csi-rbd-plugin-sa created
serviceaccount/rook-csi-rbd-provisioner-sa created
configmap/rook-ceph-operator-config created
deployment.apps/rook-ceph-operator created
Error from server (AlreadyExists): error when creating "common.yaml": clusterroles.rbac.authorization.k8s.io "psp:rook" already exists
Error from server (AlreadyExists): error when creating "common.yaml": clusterrolebindings.rbac.authorization.k8s.io "rook-ceph-system-psp" already exists
Error from server (AlreadyExists): error when creating "common.yaml": clusterrolebindings.rbac.authorization.k8s.io "rook-csi-cephfs-plugin-sa-psp" already exists
Error from server (AlreadyExists): error when creating "common.yaml": clusterrolebindings.rbac.authorization.k8s.io "rook-csi-cephfs-provisioner-sa-psp" already exists
Error from server (AlreadyExists): error when creating "common.yaml": clusterrolebindings.rbac.authorization.k8s.io "rook-csi-rbd-plugin-sa-psp" already exists
Error from server (AlreadyExists): error when creating "common.yaml": clusterrolebindings.rbac.authorization.k8s.io "rook-csi-rbd-provisioner-sa-psp" already exists
Error from server (AlreadyExists): error when creating "common.yaml": podsecuritypolicies.policy "00-rook-privileged" already exists

We can check on the Operator to see when it comes up

builder@DESKTOP-QADGF36:~/rook/deploy/examples$ kubectl get pods -n rook-ceph
NAME                                 READY   STATUS              RESTARTS   AGE
rook-ceph-operator-b5c96c99b-fpslw   0/1     ContainerCreating   0          41s

builder@DESKTOP-QADGF36:~/rook/deploy/examples$ kubectl describe pod rook-ceph-operator-b5c96c99b-fpslw -n rook-ceph | tail -n 5
  Normal  Scheduled  84s   default-scheduler  Successfully assigned rook-ceph/rook-ceph-operator-b5c96c99b-fpslw to builder-macbookpro2
  Normal  Pulling    83s   kubelet            Pulling image "rook/ceph:master"
  Normal  Pulled     0s    kubelet            Successfully pulled image "rook/ceph:master" in 1m22.985243362s
  Normal  Created    0s    kubelet            Created container rook-ceph-operator
  Normal  Started    0s    kubelet            Started container rook-ceph-operator
builder@DESKTOP-QADGF36:~/rook/deploy/examples$ kubectl get pods -n rook-ceph
NAME                                 READY   STATUS    RESTARTS   AGE
rook-ceph-operator-b5c96c99b-fpslw   1/1     Running   0          105s

Since we are on a basic 2 node cluster, we will use the Cluster-test.yaml to setup a Test Ceph Cluster

builder@DESKTOP-QADGF36:~/rook/deploy/examples$ kubectl apply -f cluster-test.yaml
configmap/rook-config-override created
cephcluster.ceph.rook.io/my-cluster created
cephblockpool.ceph.rook.io/builtin-mgr created

This time I see pods coming up

builder@DESKTOP-QADGF36:~/rook/deploy/examples$ kubectl get pods -n rook-ceph
NAME                                            READY   STATUS              RESTARTS   AGE
rook-ceph-operator-b5c96c99b-fpslw              1/1     Running             0          3m25s
rook-ceph-mon-a-canary-868d956cbf-zfvnd         1/1     Terminating         0          28s
csi-rbdplugin-kp8pk                             0/2     ContainerCreating   0          25s
csi-rbdplugin-provisioner-6c99988f59-rfrs4      0/5     ContainerCreating   0          25s
csi-rbdplugin-vg7zm                             0/2     ContainerCreating   0          25s
csi-rbdplugin-provisioner-6c99988f59-2b5mv      0/5     ContainerCreating   0          25s
csi-cephfsplugin-bfvnv                          0/2     ContainerCreating   0          25s
csi-cephfsplugin-provisioner-846bf56886-zhvqf   0/5     ContainerCreating   0          25s
csi-cephfsplugin-provisioner-846bf56886-d52ww   0/5     ContainerCreating   0          25s
csi-cephfsplugin-wftx4                          0/2     ContainerCreating   0          25s
rook-ceph-mon-a-5cddf7f9fc-2r42z                0/1     Running             0          26s

Ceph Toolbox

Let’s add the Toolbox and see that it was rolled out

builder@DESKTOP-QADGF36:~/rook/deploy/examples$ kubectl create -f toolbox.yaml
deployment.apps/rook-ceph-tools created
$ kubectl -n rook-ceph rollout status deploy/rook-ceph-tools
deployment "rook-ceph-tools" successfully rolled out

I can use it to check the status

builder@DESKTOP-QADGF36:~/rook/deploy/examples$ kubectl -n rook-ceph exec -it deploy/rook-ceph-tools -- bash
bash-4.4$ ceph status
  cluster:
    id:     caceb937-3244-4bff-a80a-841da091d170
    health: HEALTH_WARN
            Reduced data availability: 1 pg inactive

  services:
    mon: 1 daemons, quorum a (age 13m)
    mgr: a(active, since 12m)
    osd: 1 osds: 0 up, 1 in (since 12m)

  data:
    pools:   1 pools, 1 pgs
    objects: 0 objects, 0 B
    usage:   0 B used, 0 B / 0 B avail
    pgs:     100.000% pgs unknown
             1 unknown
bash-4.4$ ceph osd status
ID  HOST   USED  AVAIL  WR OPS  WR DATA  RD OPS  RD DATA  STATE
 0           0      0       0        0       0        0   exists,new

We can check the Rook Dashboard as well

Get Ceph Dashboard password

$ kubectl -n rook-ceph get secret rook-ceph-dashboard-password -o jsonpath="{['data']['password']}" | base64 --decode && echo
adsfasdfasdf

We can then port-forward to service

builder@DESKTOP-QADGF36:~/rook/deploy/examples$ kubectl port-forward -n rook-ceph svc/rook-ceph-mgr-dashboard 7000:7000
Forwarding from 127.0.0.1:7000 -> 7000
Forwarding from [::1]:7000 -> 7000

/content/images/2022/09/longhornrook-28b.png

I see similar status information

/content/images/2022/09/longhornrook-29b.png

I’ll try and create a pool

/content/images/2022/09/longhornrook-30b.png

And another

/content/images/2022/09/longhornrook-31b.png

I do not see Storage Classes made available, however

The status has stayed in Warn

$ kubectl -n rook-ceph exec -it deploy/rook-ceph-tools -- ceph status
  cluster:
    id:     caceb937-3244-4bff-a80a-841da091d170
    health: HEALTH_WARN
            Reduced data availability: 3 pgs inactive

  services:
    mon: 1 daemons, quorum a (age 24m)
    mgr: a(active, since 22m)
    osd: 1 osds: 0 up, 1 in (since 23m)

  data:
    pools:   3 pools, 3 pgs
    objects: 0 objects, 0 B
    usage:   0 B used, 0 B / 0 B avail
    pgs:     100.000% pgs unknown
             3 unknown

And the Operator complains of unclean pgs

$ kubectl -n rook-ceph logs -l app=rook-ceph-operator
2022-08-29 11:52:06.525404 I | clusterdisruption-controller: osd "rook-ceph-osd-0" is down but no node drain is detected
2022-08-29 11:52:07.219354 I | clusterdisruption-controller: osd is down in failure domain "anna-macbookair" and pgs are not active+clean. pg health: "cluster is not fully clean. PGs: [{StateName:unknown Count:3}]"
2022-08-29 11:52:25.261134 I | clusterdisruption-controller: osd "rook-ceph-osd-0" is down but no node drain is detected
2022-08-29 11:52:25.938538 I | clusterdisruption-controller: osd is down in failure domain "anna-macbookair" and pgs are not active+clean. pg health: "cluster is not fully clean. PGs: [{StateName:unknown Count:3}]"
2022-08-29 11:52:37.246992 I | clusterdisruption-controller: osd "rook-ceph-osd-0" is down but no node drain is detected
2022-08-29 11:52:38.056085 I | clusterdisruption-controller: osd is down in failure domain "anna-macbookair" and pgs are not active+clean. pg health: "cluster is not fully clean. PGs: [{StateName:unknown Count:3}]"
2022-08-29 11:52:55.973639 I | clusterdisruption-controller: osd "rook-ceph-osd-0" is down but no node drain is detected
2022-08-29 11:52:56.624652 I | clusterdisruption-controller: osd is down in failure domain "anna-macbookair" and pgs are not active+clean. pg health: "cluster is not fully clean. PGs: [{StateName:unknown Count:3}]"
2022-08-29 11:53:08.261183 I | clusterdisruption-controller: osd "rook-ceph-osd-0" is down but no node drain is detected
2022-08-29 11:53:08.929602 I | clusterdisruption-controller: osd is down in failure domain "anna-macbookair" and pgs are not active+clean. pg health: "cluster is not fully clean. PGs: [{StateName:unknown Count:3}]"

Rook : Attempt 3 : No thumb drive, fresh k3s

Add LVM, which is a pre-req for Rook, to both nodes

builder@anna-MacBookAir:~$ sudo apt-get install lvm2
[sudo] password for builder:
Reading package lists... Done
Building dependency tree
Reading state information... Done
The following packages were automatically installed and are no longer required:
  libfprint-2-tod1 libfwupdplugin1 libllvm10 libllvm11 shim
Use 'sudo apt autoremove' to remove them.
The following additional packages will be installed:
  dmeventd libaio1 libdevmapper-event1.02.1 liblvm2cmd2.03 libreadline5 thin-provisioning-tools
The following NEW packages will be installed:
  dmeventd libaio1 libdevmapper-event1.02.1 liblvm2cmd2.03 libreadline5 lvm2 thin-provisioning-tools
0 upgraded, 7 newly installed, 0 to remove and 1 not upgraded.
Need to get 2,255 kB of archives.
After this operation, 8,919 kB of additional disk space will be used.
Do you want to continue? [Y/n] y
...

Download rook

$ git clone --single-branch --branch release-1.9 https://github.com/rook/rook.git
Cloning into 'rook'...
remote: Enumerating objects: 80478, done.
remote: Counting objects: 100% (86/86), done.
remote: Compressing objects: 100% (75/75), done.
remote: Total 80478 (delta 13), reused 58 (delta 9), pack-reused 80392
Receiving objects: 100% (80478/80478), 44.82 MiB | 5.54 MiB/s, done.
Resolving deltas: 100% (56224/56224), done.

Create common

builder@DESKTOP-72D2D9T:~/Workspaces/rook/deploy/examples$ kubectl create -f common.yaml
namespace/rook-ceph created
Warning: policy/v1beta1 PodSecurityPolicy is deprecated in v1.21+, unavailable in v1.25+
role.rbac.authorization.k8s.io/cephfs-external-provisioner-cfg created
role.rbac.authorization.k8s.io/rbd-csi-nodeplugin created
role.rbac.authorization.k8s.io/rbd-external-provisioner-cfg created
role.rbac.authorization.k8s.io/rook-ceph-cmd-reporter created
role.rbac.authorization.k8s.io/rook-ceph-mgr created
role.rbac.authorization.k8s.io/rook-ceph-osd created
role.rbac.authorization.k8s.io/rook-ceph-purge-osd created
role.rbac.authorization.k8s.io/rook-ceph-rgw created
role.rbac.authorization.k8s.io/rook-ceph-system created
rolebinding.rbac.authorization.k8s.io/cephfs-csi-provisioner-role-cfg created
rolebinding.rbac.authorization.k8s.io/rbd-csi-nodeplugin-role-cfg created
rolebinding.rbac.authorization.k8s.io/rbd-csi-provisioner-role-cfg created
rolebinding.rbac.authorization.k8s.io/rook-ceph-cluster-mgmt created
rolebinding.rbac.authorization.k8s.io/rook-ceph-cmd-reporter created
rolebinding.rbac.authorization.k8s.io/rook-ceph-cmd-reporter-psp created
rolebinding.rbac.authorization.k8s.io/rook-ceph-default-psp created
rolebinding.rbac.authorization.k8s.io/rook-ceph-mgr created
rolebinding.rbac.authorization.k8s.io/rook-ceph-mgr-psp created
rolebinding.rbac.authorization.k8s.io/rook-ceph-mgr-system created
rolebinding.rbac.authorization.k8s.io/rook-ceph-osd created
rolebinding.rbac.authorization.k8s.io/rook-ceph-osd-psp created
rolebinding.rbac.authorization.k8s.io/rook-ceph-purge-osd created
rolebinding.rbac.authorization.k8s.io/rook-ceph-purge-osd-psp created
rolebinding.rbac.authorization.k8s.io/rook-ceph-rgw created
rolebinding.rbac.authorization.k8s.io/rook-ceph-rgw-psp created
rolebinding.rbac.authorization.k8s.io/rook-ceph-system created
serviceaccount/rook-ceph-cmd-reporter created
serviceaccount/rook-ceph-mgr created
serviceaccount/rook-ceph-osd created
serviceaccount/rook-ceph-purge-osd created
serviceaccount/rook-ceph-rgw created
serviceaccount/rook-ceph-system created
serviceaccount/rook-csi-cephfs-plugin-sa created
serviceaccount/rook-csi-cephfs-provisioner-sa created
serviceaccount/rook-csi-rbd-plugin-sa created
serviceaccount/rook-csi-rbd-provisioner-sa created
Error from server (AlreadyExists): error when creating "common.yaml": clusterroles.rbac.authorization.k8s.io "cephfs-csi-nodeplugin" already exists
Error from server (AlreadyExists): error when creating "common.yaml": clusterroles.rbac.authorization.k8s.io "cephfs-external-provisioner-runner" already exists
Error from server (AlreadyExists): error when creating "common.yaml": clusterroles.rbac.authorization.k8s.io "psp:rook" already exists
Error from server (AlreadyExists): error when creating "common.yaml": clusterroles.rbac.authorization.k8s.io "rbd-csi-nodeplugin" already exists
Error from server (AlreadyExists): error when creating "common.yaml": clusterroles.rbac.authorization.k8s.io "rbd-external-provisioner-runner" already exists
Error from server (AlreadyExists): error when creating "common.yaml": clusterroles.rbac.authorization.k8s.io "rook-ceph-cluster-mgmt" already exists
Error from server (AlreadyExists): error when creating "common.yaml": clusterroles.rbac.authorization.k8s.io "rook-ceph-global" already exists
Error from server (AlreadyExists): error when creating "common.yaml": clusterroles.rbac.authorization.k8s.io "rook-ceph-mgr-cluster" already exists
Error from server (AlreadyExists): error when creating "common.yaml": clusterroles.rbac.authorization.k8s.io "rook-ceph-mgr-system" already exists
Error from server (AlreadyExists): error when creating "common.yaml": clusterroles.rbac.authorization.k8s.io "rook-ceph-object-bucket" already exists
Error from server (AlreadyExists): error when creating "common.yaml": clusterroles.rbac.authorization.k8s.io "rook-ceph-osd" already exists
Error from server (AlreadyExists): error when creating "common.yaml": clusterroles.rbac.authorization.k8s.io "rook-ceph-system" already exists
Error from server (AlreadyExists): error when creating "common.yaml": clusterrolebindings.rbac.authorization.k8s.io "cephfs-csi-nodeplugin" already exists
Error from server (AlreadyExists): error when creating "common.yaml": clusterrolebindings.rbac.authorization.k8s.io "cephfs-csi-provisioner-role" already exists
Error from server (AlreadyExists): error when creating "common.yaml": clusterrolebindings.rbac.authorization.k8s.io "rbd-csi-nodeplugin" already exists
Error from server (AlreadyExists): error when creating "common.yaml": clusterrolebindings.rbac.authorization.k8s.io "rbd-csi-provisioner-role" already exists
Error from server (AlreadyExists): error when creating "common.yaml": clusterrolebindings.rbac.authorization.k8s.io "rook-ceph-global" already exists
Error from server (AlreadyExists): error when creating "common.yaml": clusterrolebindings.rbac.authorization.k8s.io "rook-ceph-mgr-cluster" already exists
Error from server (AlreadyExists): error when creating "common.yaml": clusterrolebindings.rbac.authorization.k8s.io "rook-ceph-object-bucket" already exists
Error from server (AlreadyExists): error when creating "common.yaml": clusterrolebindings.rbac.authorization.k8s.io "rook-ceph-osd" already exists
Error from server (AlreadyExists): error when creating "common.yaml": clusterrolebindings.rbac.authorization.k8s.io "rook-ceph-system" already exists
Error from server (AlreadyExists): error when creating "common.yaml": clusterrolebindings.rbac.authorization.k8s.io "rook-ceph-system-psp" already exists
Error from server (AlreadyExists): error when creating "common.yaml": clusterrolebindings.rbac.authorization.k8s.io "rook-csi-cephfs-plugin-sa-psp" already exists
Error from server (AlreadyExists): error when creating "common.yaml": clusterrolebindings.rbac.authorization.k8s.io "rook-csi-cephfs-provisioner-sa-psp" already exists
Error from server (AlreadyExists): error when creating "common.yaml": clusterrolebindings.rbac.authorization.k8s.io "rook-csi-rbd-plugin-sa-psp" already exists
Error from server (AlreadyExists): error when creating "common.yaml": clusterrolebindings.rbac.authorization.k8s.io "rook-csi-rbd-provisioner-sa-psp" already exists
Error from server (AlreadyExists): error when creating "common.yaml": podsecuritypolicies.policy "00-rook-privileged" already exists

Create the Operator

$ kubectl create -f operator.yaml
configmap/rook-ceph-operator-config created
deployment.apps/rook-ceph-operator created

I could see the Operator come up

$ kubectl get pod -n rook-ceph
NAME                                  READY   STATUS    RESTARTS   AGE
rook-ceph-operator-5949bdbb59-clhwc   1/1     Running   0          21s

# waiting overnight

$ kubectl get pod -n rook-ceph
NAME                                  READY   STATUS    RESTARTS   AGE
rook-ceph-operator-5949bdbb59-dq6cj   1/1     Running   0          9h

Really, no matter what i did, rook-ceph was in a trashed state

2022-08-27 02:08:03.115018 E | ceph-cluster-controller: failed to reconcile CephCluster "rook-ceph/rook-ceph". CephCluster "rook-ceph/rook-ceph" will not be deleted until all dependents are removed: CephBlockPool: [ceph-blockpool builtin-mgr], CephFilesystem: [ceph-filesystem], CephObjectStore: [ceph-objectstore]
2022-08-27 02:08:13.536370 I | ceph-cluster-controller: CephCluster "rook-ceph/rook-ceph" will not be deleted until all dependents are removed: CephBlockPool: [ceph-blockpool builtin-mgr], CephFilesystem: [ceph-filesystem], CephObjectStore: [ceph-objectstore]
2022-08-27 02:08:13.553691 E | ceph-cluster-controller: failed to reconcile CephCluster "rook-ceph/rook-ceph". CephCluster "rook-ceph/rook-ceph" will not be deleted until all dependents are removed: CephBlockPool: [ceph-blockpool builtin-mgr], CephFilesystem: [ceph-filesystem], CephObjectStore: [ceph-objectstore]

After I created the sc

$ kubectl apply -f storageclass-bucket-retain.yaml

Then I could try and create a PVC request

builder@DESKTOP-72D2D9T:~/Workspaces/jekyll-blog$ cat pvc-ceph.yaml
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
  name: mongo-ceph-pvc
spec:
  storageClassName: rook-ceph-retain-bucket
  accessModes:
  - ReadWriteOnce
  resources:
    requests:
      storage: 5Gi
builder@DESKTOP-72D2D9T:~/Workspaces/jekyll-blog$ kubectl apply -f pvc-ceph.yaml
persistentvolumeclaim/mongo-ceph-pvc created
builder@DESKTOP-72D2D9T:~/Workspaces/jekyll-blog$ kubectl get pvc
NAME             STATUS    VOLUME   CAPACITY   ACCESS MODES   STORAGECLASS              AGE
mongo-ceph-pvc   Pending                                      rook-ceph-retain-bucket   6s

Clearly it didnt work (no storage)

Rook : Attempt 4 : Using AKS

At this point I just gave up trying to force Rook-Ceph onto my home cluster. I figured, perhaps using AKS with some mounted storage might work.

I created a based 3 node AKS cluster and logged in.

$ kubectl get nodes
NAME                                STATUS   ROLES   AGE     VERSION
aks-nodepool1-19679206-vmss000000   Ready    agent   3m42s   v1.23.8
aks-nodepool1-19679206-vmss000001   Ready    agent   3m46s   v1.23.8
aks-nodepool1-19679206-vmss000002   Ready    agent   3m46s   v1.23.8

I then added disks (40Gb) to each node manually

/content/images/2022/09/longhornrook-30.png

Now I can install the Rook Operator

$ helm install --create-namespace --namespace rook-ceph rook-ceph rook-release/rook-ceph
W0829 20:17:58.933428    2813 warnings.go:70] policy/v1beta1 PodSecurityPolicy is deprecated in v1.21+, unavailable in v1.25+
W0829 20:18:11.490180    2813 warnings.go:70] policy/v1beta1 PodSecurityPolicy is deprecated in v1.21+, unavailable in v1.25+
NAME: rook-ceph
LAST DEPLOYED: Mon Aug 29 20:17:57 2022
NAMESPACE: rook-ceph
STATUS: deployed
REVISION: 1
TEST SUITE: None
NOTES:
The Rook Operator has been installed. Check its status by running:
  kubectl --namespace rook-ceph get pods -l "app=rook-ceph-operator"

Visit https://rook.io/docs/rook/latest for instructions on how to create and configure Rook clusters

Important Notes:
- You must customize the 'CephCluster' resource in the sample manifests for your cluster.
- Each CephCluster must be deployed to its own namespace, the samples use `rook-ceph` for the namespace.
- The sample manifests assume you also installed the rook-ceph operator in the `rook-ceph` namespace.
- The helm chart includes all the RBAC required to create a CephCluster CRD in the same namespace.
- Any disk devices you add to the cluster in the 'CephCluster' must be empty (no filesystem and no partitions).

We can check the status of the Operator

$ helm list -n rook-ceph
NAME            NAMESPACE       REVISION        UPDATED                                 STATUS          CHART                   APP VERSION
rook-ceph       rook-ceph       1               2022-08-29 20:17:57.6679055 -0500 CDT   deployed        rook-ceph-v1.9.10       v1.9.10

Now install the Rook Deployment

$ helm install --create-namespace --namespace rook-ceph rook-ceph-cluster --set operatorNamespace=rook-ceph rook-release/rook-ceph-cluster
NAME: rook-ceph-cluster
LAST DEPLOYED: Mon Aug 29 20:20:19 2022
NAMESPACE: rook-ceph
STATUS: deployed
REVISION: 1
TEST SUITE: None
NOTES:
The Ceph Cluster has been installed. Check its status by running:
  kubectl --namespace rook-ceph get cephcluster

Visit https://rook.io/docs/rook/latest/CRDs/ceph-cluster-crd/ for more information about the Ceph CRD.

Important Notes:
- You can only deploy a single cluster per namespace
- If you wish to delete this cluster and start fresh, you will also have to wipe the OSD disks using `sfdisk`

Unlike before, I immediately saw new propegated Storage classes

$ kubectl get sc
NAME                    PROVISIONER                     RECLAIMPOLICY   VOLUMEBINDINGMODE      ALLOWVOLUMEEXPANSION   AGE
azurefile               file.csi.azure.com              Delete          Immediate              true                   22m
azurefile-csi           file.csi.azure.com              Delete          Immediate              true                   22m
azurefile-csi-premium   file.csi.azure.com              Delete          Immediate              true                   22m
azurefile-premium       file.csi.azure.com              Delete          Immediate              true                   22m
ceph-block (default)    rook-ceph.rbd.csi.ceph.com      Delete          Immediate              true                   98s
ceph-bucket             rook-ceph.ceph.rook.io/bucket   Delete          Immediate              false                  98s
ceph-filesystem         rook-ceph.cephfs.csi.ceph.com   Delete          Immediate              true                   98s
default (default)       disk.csi.azure.com              Delete          WaitForFirstConsumer   true                   22m
managed                 disk.csi.azure.com              Delete          WaitForFirstConsumer   true                   22m
managed-csi             disk.csi.azure.com              Delete          WaitForFirstConsumer   true                   22m
managed-csi-premium     disk.csi.azure.com              Delete          WaitForFirstConsumer   true                   22m
managed-premium         disk.csi.azure.com              Delete          WaitForFirstConsumer   true                   22m

Testing

Let’s now add a PVC using th default ceph-block storage class

$ cat pvc-ceph.yaml
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
  name: mongo-ceph-pvc
spec:
  storageClassName: ceph-block
  accessModes:
  - ReadWriteOnce
  resources:
    requests:
      storage: 5Gi
$ kubectl apply -f pvc-ceph.yaml
persistentvolumeclaim/mongo-ceph-pvc created
$ kubectl get pvc
NAME             STATUS    VOLUME   CAPACITY   ACCESS MODES   STORAGECLASS   AGE
mongo-ceph-pvc   Pending                                      ceph-block     60s

Looking for the Rook Ceph Dashboard

builder@DESKTOP-72D2D9T:~/Workspaces/jekyll-blog$ kubectl get svc -n rook-ceph
No resources found in rook-ceph namespace.
builder@DESKTOP-72D2D9T:~/Workspaces/jekyll-blog$ kubectl get ingress --all-namespaces
No resources found
builder@DESKTOP-72D2D9T:~/Workspaces/jekyll-blog$ kubectl get svc --all-namespaces
NAMESPACE           NAME                           TYPE        CLUSTER-IP     EXTERNAL-IP   PORT(S)         AGE
default             kubernetes                     ClusterIP   10.0.0.1       <none>        443/TCP         27m
gatekeeper-system   gatekeeper-webhook-service     ClusterIP   10.0.43.240    <none>        443/TCP         17m
kube-system         azure-policy-webhook-service   ClusterIP   10.0.194.119   <none>        443/TCP         17m
kube-system         kube-dns                       ClusterIP   10.0.0.10      <none>        53/UDP,53/TCP   27m
kube-system         metrics-server                 ClusterIP   10.0.234.196   <none>        443/TCP         27m
kube-system         npm-metrics-cluster-service    ClusterIP   10.0.149.126   <none>        9000/TCP        27m

I checked the helm values and saw the dashboard was set to enabled

However, none was created

$ kubectl get pods -n rook-ceph
NAME                                            READY   STATUS    RESTARTS   AGE
csi-cephfsplugin-9wdwt                          2/2     Running   0          33m
csi-cephfsplugin-f4s2t                          2/2     Running   0          31m
csi-cephfsplugin-j4l6h                          2/2     Running   0          33m
csi-cephfsplugin-provisioner-5965769756-42zzh   0/5     Pending   0          33m
csi-cephfsplugin-provisioner-5965769756-mhzcv   0/5     Pending   0          33m
csi-cephfsplugin-v5hx6                          2/2     Running   0          33m
csi-rbdplugin-2qtbn                             2/2     Running   0          33m
csi-rbdplugin-82nqx                             2/2     Running   0          33m
csi-rbdplugin-k8rtj                             2/2     Running   0          33m
csi-rbdplugin-provisioner-7cb769bbb7-d98bq      5/5     Running   0          33m
csi-rbdplugin-provisioner-7cb769bbb7-r4zgb      0/5     Pending   0          33m
csi-rbdplugin-qxrqf                             2/2     Running   0          31m
rook-ceph-mon-a-canary-5cff67c4fb-4jgs6         0/1     Pending   0          93s
rook-ceph-mon-b-canary-56f8f669cd-zv2c9         0/1     Pending   0          93s
rook-ceph-mon-c-canary-77958fc564-mlw9r         0/1     Pending   0          93s
rook-ceph-operator-56c65d67c5-lm4dq             1/1     Running   0          35m

After waiting a while and seeing no progress, I deleted the AKS cluster.

Rook : Attempt fünf : Using NFS instead of Ceph

I had one last idea for Rook - try the NFS implementation instead of Ceph.

I found yet-another guide here on using Rook NFS for PVCs. Perhaps the needs of Ceph just do not jive with my Laptops and Azure approach.

I respun this time with 1.23:

$ curl -sfL https://get.k3s.io  | INSTALL_K3S_VERSION="v1.23.10+k3s1" K3S_KUBECONFIG_MODE="644" INSTALL_K3S_EXEC="--tls-san 73.242.50.46
" sh -
[INFO]  Using v1.23.10+k3s1 as release
[INFO]  Downloading hash https://github.com/k3s-io/k3s/releases/download/v1.23.10+k3s1/sha256sum-amd64.txt
[INFO]  Downloading binary https://github.com/k3s-io/k3s/releases/download/v1.23.10+k3s1/k3s
[INFO]  Verifying binary download
[INFO]  Installing k3s to /usr/local/bin/k3s
[INFO]  Skipping installation of SELinux RPM
[INFO]  Creating /usr/local/bin/kubectl symlink to k3s
[INFO]  Creating /usr/local/bin/crictl symlink to k3s
[INFO]  Creating /usr/local/bin/ctr symlink to k3s
[INFO]  Creating killall script /usr/local/bin/k3s-killall.sh
[INFO]  Creating uninstall script /usr/local/bin/k3s-uninstall.sh
[INFO]  env: Creating environment file /etc/systemd/system/k3s.service.env
[INFO]  systemd: Creating service file /etc/systemd/system/k3s.service
[INFO]  systemd: Enabling k3s unit
Created symlink /etc/systemd/system/multi-user.target.wants/k3s.service → /etc/systemd/system/k3s.service.
[INFO]  systemd: Starting k3s

And after adding the worker node, i could pull the kubeconfig and see our new cluster

$ kubectl get nodes
NAME                  STATUS   ROLES                  AGE     VERSION
anna-macbookair       Ready    control-plane,master   2m27s   v1.23.10+k3s1
builder-macbookpro2   Ready    <none>                 66s     v1.23.10+k3s1

While deprecated and unsupported, we can try the last known good Rook NFS. We’ll largely be following this LKE guide sprinkled with some notes from notes from an unreleased version.

Get the Rook NFS Code

builder@DESKTOP-72D2D9T:~/Workspaces$ git clone --single-branch --branch v1.7.3 https://github.com/rook/nfs.git
Cloning into 'nfs'...
remote: Enumerating objects: 68742, done.
remote: Counting objects: 100% (17880/17880), done.
remote: Compressing objects: 100% (3047/3047), done.
remote: Total 68742 (delta 16619), reused 15044 (delta 14811), pack-reused 50862
Receiving objects: 100% (68742/68742), 34.16 MiB | 2.60 MiB/s, done.
Resolving deltas: 100% (48931/48931), done.
Note: switching to '99e2a518700c549e2d0855c00598ae560a4d002c'.

You are in 'detached HEAD' state. You can look around, make experimental
changes and commit them, and you can discard any commits you make in this
state without impacting any branches by switching back to a branch.

If you want to create a new branch to retain commits you create, you may
do so (now or later) by using -c with the switch command. Example:

  git switch -c <new-branch-name>

Or undo this operation with:

  git switch -

Turn off this advice by setting config variable advice.detachedHead to false

Now apply the CRDs and Operator

builder@DESKTOP-72D2D9T:~/Workspaces$ cd nfs/cluster/examples/kubernetes/nfs/
builder@DESKTOP-72D2D9T:~/Workspaces/nfs/cluster/examples/kubernetes/nfs$ kubectl create -f crds.yaml
customresourcedefinition.apiextensions.k8s.io/nfsservers.nfs.rook.io created

builder@DESKTOP-72D2D9T:~/Workspaces/nfs/cluster/examples/kubernetes/nfs$ kubectl create -f operator.yaml
namespace/rook-nfs-system created
serviceaccount/rook-nfs-operator created
clusterrolebinding.rbac.authorization.k8s.io/rook-nfs-operator created
clusterrole.rbac.authorization.k8s.io/rook-nfs-operator created
deployment.apps/rook-nfs-operator created

Check that the Operator is running

$ kubectl get pods -n rook-nfs-system
NAME                                 READY   STATUS    RESTARTS   AGE
rook-nfs-operator-556c5ddff7-dmfps   1/1     Running   0          81s

I need to add cert-manager

$ kubectl apply -f https://github.com/jetstack/cert-manager/releases/download/v1.8.0/cert-manager.yaml
namespace/cert-manager created
customresourcedefinition.apiextensions.k8s.io/certificaterequests.cert-manager.io created
customresourcedefinition.apiextensions.k8s.io/certificates.cert-manager.io created
customresourcedefinition.apiextensions.k8s.io/challenges.acme.cert-manager.io created
customresourcedefinition.apiextensions.k8s.io/clusterissuers.cert-manager.io created
customresourcedefinition.apiextensions.k8s.io/issuers.cert-manager.io created
customresourcedefinition.apiextensions.k8s.io/orders.acme.cert-manager.io created
serviceaccount/cert-manager-cainjector created
serviceaccount/cert-manager created
serviceaccount/cert-manager-webhook created
configmap/cert-manager-webhook created
clusterrole.rbac.authorization.k8s.io/cert-manager-cainjector created
clusterrole.rbac.authorization.k8s.io/cert-manager-controller-issuers created
clusterrole.rbac.authorization.k8s.io/cert-manager-controller-clusterissuers created
clusterrole.rbac.authorization.k8s.io/cert-manager-controller-certificates created
clusterrole.rbac.authorization.k8s.io/cert-manager-controller-orders created
clusterrole.rbac.authorization.k8s.io/cert-manager-controller-challenges created
clusterrole.rbac.authorization.k8s.io/cert-manager-controller-ingress-shim created
clusterrole.rbac.authorization.k8s.io/cert-manager-view created
clusterrole.rbac.authorization.k8s.io/cert-manager-edit created
clusterrole.rbac.authorization.k8s.io/cert-manager-controller-approve:cert-manager-io created
clusterrole.rbac.authorization.k8s.io/cert-manager-controller-certificatesigningrequests created
clusterrole.rbac.authorization.k8s.io/cert-manager-webhook:subjectaccessreviews created
clusterrolebinding.rbac.authorization.k8s.io/cert-manager-cainjector created
clusterrolebinding.rbac.authorization.k8s.io/cert-manager-controller-issuers created
clusterrolebinding.rbac.authorization.k8s.io/cert-manager-controller-clusterissuers created
clusterrolebinding.rbac.authorization.k8s.io/cert-manager-controller-certificates created
clusterrolebinding.rbac.authorization.k8s.io/cert-manager-controller-orders created
clusterrolebinding.rbac.authorization.k8s.io/cert-manager-controller-challenges created
clusterrolebinding.rbac.authorization.k8s.io/cert-manager-controller-ingress-shim created
clusterrolebinding.rbac.authorization.k8s.io/cert-manager-controller-approve:cert-manager-io created
clusterrolebinding.rbac.authorization.k8s.io/cert-manager-controller-certificatesigningrequests created
clusterrolebinding.rbac.authorization.k8s.io/cert-manager-webhook:subjectaccessreviews created
role.rbac.authorization.k8s.io/cert-manager-cainjector:leaderelection created
role.rbac.authorization.k8s.io/cert-manager:leaderelection created
role.rbac.authorization.k8s.io/cert-manager-webhook:dynamic-serving created
rolebinding.rbac.authorization.k8s.io/cert-manager-cainjector:leaderelection created
rolebinding.rbac.authorization.k8s.io/cert-manager:leaderelection created
rolebinding.rbac.authorization.k8s.io/cert-manager-webhook:dynamic-serving created
service/cert-manager created
service/cert-manager-webhook created
deployment.apps/cert-manager-cainjector created
deployment.apps/cert-manager created
deployment.apps/cert-manager-webhook created
mutatingwebhookconfiguration.admissionregistration.k8s.io/cert-manager-webhook created
validatingwebhookconfiguration.admissionregistration.k8s.io/cert-manager-webhook created

We need to change the APIs to move off the deprecated APIs:

/content/images/2022/09/longhornrook-32.png

I can then apply the webhook

builder@DESKTOP-72D2D9T:~/Workspaces/nfs/cluster/examples/kubernetes/nfs$ kubectl apply -f webhook.yaml
serviceaccount/rook-nfs-webhook created
role.rbac.authorization.k8s.io/rook-nfs-webhook created
rolebinding.rbac.authorization.k8s.io/rook-nfs-webhook created
certificate.cert-manager.io/rook-nfs-webhook-cert created
issuer.cert-manager.io/rook-nfs-selfsigned-issuer created
validatingwebhookconfiguration.admissionregistration.k8s.io/rook-nfs-validating-webhook-configuration created
service/rook-nfs-webhook created
deployment.apps/rook-nfs-webhook created

Now let’s check that Pods for Cert-Manager and NFS-Rook namespace

builder@DESKTOP-72D2D9T:~/Workspaces/nfs/cluster/examples/kubernetes/nfs$ kubectl get pods -n cert-manager
NAME                                       READY   STATUS    RESTARTS   AGE
cert-manager-webhook-6c9dd55dc8-chsbp      1/1     Running   0          2m10s
cert-manager-64d9bc8b74-5k4dd              1/1     Running   0          2m10s
cert-manager-cainjector-6db6b64d5f-d6wd6   1/1     Running   0          2m10s

builder@DESKTOP-72D2D9T:~/Workspaces/nfs/cluster/examples/kubernetes/nfs$ kubectl get pods -n rook-nfs-system
NAME                                 READY   STATUS    RESTARTS   AGE
rook-nfs-operator-556c5ddff7-dmfps   1/1     Running   0          9m16s
rook-nfs-webhook-75bc87d7b4-9dvqz    1/1     Running   0          91s

Apply the RBAC

$ kubectl apply -f rbac.yaml
namespace/rook-nfs created
serviceaccount/rook-nfs-server created
clusterrole.rbac.authorization.k8s.io/rook-nfs-provisioner-runner created
clusterrolebinding.rbac.authorization.k8s.io/rook-nfs-provisioner-runner created

We need to add the StorageClass next

builder@DESKTOP-72D2D9T:~/Workspaces/nfs/cluster/examples/kubernetes/nfs$ kubectl apply -f sc.yaml
storageclass.storage.k8s.io/rook-nfs-share1 created
builder@DESKTOP-72D2D9T:~/Workspaces/nfs/cluster/examples/kubernetes/nfs$ kubectl get sc
NAME                   PROVISIONER                        RECLAIMPOLICY   VOLUMEBINDINGMODE      ALLOWVOLUMEEXPANSION   AGE
local-path (default)   rancher.io/local-path              Delete          WaitForFirstConsumer   false
14h
rook-nfs-share1        nfs.rook.io/rook-nfs-provisioner   Delete          Immediate              false
3s

Create the PVC and NFSServer

builder@DESKTOP-72D2D9T:~/Workspaces/nfs/cluster/examples/kubernetes/nfs$ cat nfs.yaml
---
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
  name: nfs-default-claim
  namespace: rook-nfs
spec:
  accessModes:
  - ReadWriteOnce # Edit this line to ReadWriteOnce
  resources:
    requests:
      storage: 1Gi
---
apiVersion: nfs.rook.io/v1alpha1
kind: NFSServer
metadata:
  name: rook-nfs
  namespace: rook-nfs
spec:
  replicas: 1
  exports:
    - name: share1
      server:
        accessMode: ReadWrite
        squash: "none"
      # A Persistent Volume Claim must be created before creating NFS CRD instance.
      persistentVolumeClaim:
        claimName: nfs-default-claim
  # A key/value list of annotations
  annotations:
    rook: nfs

builder@DESKTOP-72D2D9T:~/Workspaces/nfs/cluster/examples/kubernetes/nfs$ kubectl apply -f nfs.yaml
persistentvolumeclaim/nfs-default-claim created
nfsserver.nfs.rook.io/rook-nfs created

It used the default local-path storage for its backend

$ kubectl describe pvc nfs-default-claim -n rook
-nfs
Name:          nfs-default-claim
Namespace:     rook-nfs
StorageClass:  local-path
Status:        Bound
Volume:        pvc-52eae943-32fa-4899-bf1e-6ffa8351b428
Labels:        <none>
Annotations:   pv.kubernetes.io/bind-completed: yes
               pv.kubernetes.io/bound-by-controller: yes
               volume.beta.kubernetes.io/storage-provisioner: rancher.io/local-path
               volume.kubernetes.io/selected-node: isaac-macbookpro
               volume.kubernetes.io/storage-provisioner: rancher.io/local-path
Finalizers:    [kubernetes.io/pvc-protection]
Capacity:      1Gi
Access Modes:  RWO
VolumeMode:    Filesystem
Used By:       rook-nfs-0
Events:
  Type    Reason                 Age                From
                                Message
  ----    ------                 ----               ----
                                -------
  Normal  WaitForFirstConsumer   15m                persistentvolume-controller
                                waiting for first consumer to be created before binding
  Normal  Provisioning           15m                rancher.io/local-path_local-path-provisioner-6c79684f77-8rkg6_fec28c6d-228a-4955-b40a-2ef8dde56e43  External provisioner is provisioning volume for claim "rook-nfs/nfs-default-claim"
  Normal  ExternalProvisioning   15m (x3 over 15m)  persistentvolume-controller
                                waiting for a volume to be created, either by external provisioner "rancher.io/local-path" or manually created by system administrator
  Normal  ProvisioningSucceeded  14m                rancher.io/local-path_local-path-provisioner-6c79684f77-8rkg6_fec28c6d-228a-4955-b40a-2ef8dde56e43  Successfully provisioned volume pvc-52eae943-32fa-4899-bf1e-6ffa8351b428

Now we can test it

builder@DESKTOP-72D2D9T:~/Workspaces/nfs/cluster/examples/kubernetes/nfs$ cat pvc.yaml
---
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
  name: rook-nfs-pv-claim
spec:
  storageClassName: "rook-nfs-share1"
  accessModes:
    - ReadWriteMany
  resources:
    requests:
      storage: 1Mi
builder@DESKTOP-72D2D9T:~/Workspaces/nfs/cluster/examples/kubernetes/nfs$ kubectl apply -f pvc.yaml
persistentvolumeclaim/rook-nfs-pv-claim created

builder@DESKTOP-72D2D9T:~/Workspaces/nfs/cluster/examples/kubernetes/nfs$ kubectl get pvc
NAME                STATUS   VOLUME                                     CAPACITY   ACCESS MODES   STORAGECLASS      AGE
rook-nfs-pv-claim   Bound    pvc-ec2b3bbb-a5dd-4811-af90-38fef3a39a5e   1Mi        RWX            rook-nfs-share1   2s

Then a busybox that uses the PVC for a /mnt

builder@DESKTOP-72D2D9T:~/Workspaces/nfs/cluster/examples/kubernetes/nfs$ cat busybox-rc.yaml
apiVersion: apps/v1
kind: Deployment
metadata:
  labels:
    app: nfs-demo
    role: busybox
  name: nfs-busybox
spec:
  replicas: 2
  selector:
    matchLabels:
      app: nfs-demo
      role: busybox
  template:
    metadata:
      labels:
        app: nfs-demo
        role: busybox
    spec:
      containers:
        - image: busybox
          command:
            - sh
            - -c
            - "while true; do date > /mnt/index.html; hostname >> /mnt/index.html; sleep $(($RANDOM % 5 + 5)); done"
          imagePullPolicy: IfNotPresent
          name: busybox
          volumeMounts:
            # name must match the volume name below
            - name: rook-nfs-vol
              mountPath: "/mnt"
      volumes:
        - name: rook-nfs-vol
          persistentVolumeClaim:
            claimName: rook-nfs-pv-claim
builder@DESKTOP-72D2D9T:~/Workspaces/nfs/cluster/examples/kubernetes/nfs$ kubectl apply -f busybox-rc.yaml
deployment.apps/nfs-busybox created

And lastly a web-rc

builder@DESKTOP-72D2D9T:~/Workspaces/nfs/cluster/examples/kubernetes/nfs$ cat web-rc.yaml
apiVersion: apps/v1
kind: Deployment
metadata:
  labels:
    app: nfs-demo
    role: web-frontend
  name: nfs-web
spec:
  replicas: 2
  selector:
    matchLabels:
      app: nfs-demo
      role: web-frontend
  template:
    metadata:
      labels:
        app: nfs-demo
        role: web-frontend
    spec:
      containers:
        - name: web
          image: nginx
          ports:
            - name: web
              containerPort: 80
          volumeMounts:
            # name must match the volume name below
            - name: rook-nfs-vol
              mountPath: "/usr/share/nginx/html"
      volumes:
        - name: rook-nfs-vol
          persistentVolumeClaim:
            claimName: rook-nfs-pv-claim
builder@DESKTOP-72D2D9T:~/Workspaces/nfs/cluster/examples/kubernetes/nfs$ kubectl apply -f web-rc.yaml
deployment.apps/nfs-web created

Then we create a service we’ll use in a moment to update an index.html

builder@DESKTOP-72D2D9T:~/Workspaces/nfs/cluster/examples/kubernetes/nfs$ cat web-service.yaml
kind: Service
apiVersion: v1
metadata:
  name: nfs-web
spec:
  ports:
    - port: 80
  selector:
    role: web-frontend
builder@DESKTOP-72D2D9T:~/Workspaces/nfs/cluster/examples/kubernetes/nfs$ kubectl apply -f web-service.yaml
service/nfs-web created

We can see the files mounted

$ kubectl exec `kubectl get pod -l app=nfs-demo,role=busybox -o jsonpath='{.items[0].metadata.name}'` -- ls -l /mnt
total 4
-rw-r--r--    1 nobody   42949672        58 Aug 31 02:31 index.html

$ kubectl exec `kubectl get pod -l app=nfs-demo,
role=web-frontend -o jsonpath='{.items[0].metadata.name}'` -- ls -l /usr/share/nginx/html
total 4
-rw-r--r-- 1 nobody 4294967294 58 Aug 31 02:34 index.html

This illustrates the busybox is writing to a volume that the nginx host is exposing thus proving NFS works.

We can even see that if we write to the file, it’s shortly thereafter updated

builder@DESKTOP-72D2D9T:~/Workspaces/nfs/cluster/examples/kubernetes/nfs$ kubectl exec `kubectl get pod -l app=nfs-demo,role=web-frontend -o jsonpath='{.items[0].metadata.name}'` -- /bin/bash -c "set -x; echo howdy >> /usr/share/nginx/html/index.html && cat /usr/share/nginx/html/index.html"
+ echo howdy
+ cat /usr/share/nginx/html/index.html
Wed Aug 31 02:38:02 UTC 2022
nfs-busybox-7678ddd9d6-np9b8
howdy
builder@DESKTOP-72D2D9T:~/Workspaces/nfs/cluster/examples/kubernetes/nfs$ kubectl exec `kubectl get pod -l app=nfs-demo,role=web-frontend -o jsonpath='{.items[0].metadata.name}'` -- cat /usr/share/nginx/html/index.html
Wed Aug 31 02:38:13 UTC 2022
nfs-busybox-7678ddd9d6-np9b8

We can see the PVC used is using Rook NFS and not local-path by describing the PVC

$ kubectl describe pvc rook-nfs-pv-claim
Name:          rook-nfs-pv-claim
Namespace:     default
StorageClass:  rook-nfs-share1
Status:        Bound
Volume:        pvc-ec2b3bbb-a5dd-4811-af90-38fef3a39a5e
Labels:        <none>
Annotations:   pv.kubernetes.io/bind-completed: yes
               pv.kubernetes.io/bound-by-controller: yes
               volume.beta.kubernetes.io/storage-provisioner: nfs.rook.io/rook-nfs-provisioner
               volume.kubernetes.io/storage-provisioner: nfs.rook.io/rook-nfs-provisioner
Finalizers:    [kubernetes.io/pvc-protection]
Capacity:      1Mi
Access Modes:  RWX
VolumeMode:    Filesystem
Used By:       nfs-busybox-7678ddd9d6-6gdg9
               nfs-busybox-7678ddd9d6-np9b8
               nfs-web-54bf6fb6b9-sfzs4
               nfs-web-54bf6fb6b9-ssjrb
Events:
  Type    Reason                 Age                From
              Message
  ----    ------                 ----               ----
              -------
  Normal  Provisioning           12m                nfs.rook.io/rook-nfs-provisioner_rook-nfs-0_f5db4056-feda-4fa7-bf6c-31b5e1abe2e8  External provisioner is provisioning volume for claim "default/rook-nfs-pv-claim"
  Normal  ExternalProvisioning   12m (x2 over 12m)  persistentvolume-controller
              waiting for a volume to be created, either by external provisioner "nfs.rook.io/rook-nfs-provisioner" or manually created by system administrator
  Normal  ProvisioningSucceeded  12m                nfs.rook.io/rook-nfs-provisioner_rook-nfs-0_f5db4056-feda-4fa7-bf6c-31b5e1abe2e8  Successfully provisioned volume pvc-ec2b3bbb-a5dd-4811-af90-38fef3a39a5e

PVCs ultimately create, or are satisfied by, created Persistent Volumes. We can check those as well

$ kubectl get pv --all-namespaces
NAME                                       CAPACITY   ACCESS MODES   RECLAIM POLICY   STATUS   CLAIM                        STORAGECLASS      REASON   AGE
pvc-52eae943-32fa-4899-bf1e-6ffa8351b428   1Gi        RWO            Delete           Bound    rook-nfs/nfs-default-claim   local-path                 17m
pvc-ec2b3bbb-a5dd-4811-af90-38fef3a39a5e   1Mi        RWX            Delete           Bound    default/rook-nfs-pv-claim    rook-nfs-share1            16m

And again, we can see that the Volume used by the rook-nfs-share1 is satisfied by NFS

$ kubectl describe pv pvc-ec2b3bbb-a5dd-4811-af9
0-38fef3a39a5e
Name:            pvc-ec2b3bbb-a5dd-4811-af90-38fef3a39a5e
Labels:          <none>
Annotations:     nfs.rook.io/project_block:
                 pv.kubernetes.io/provisioned-by: nfs.rook.io/rook-nfs-provisioner
Finalizers:      [kubernetes.io/pv-protection]
StorageClass:    rook-nfs-share1
Status:          Bound
Claim:           default/rook-nfs-pv-claim
Reclaim Policy:  Delete
Access Modes:    RWX
VolumeMode:      Filesystem
Capacity:        1Mi
Node Affinity:   <none>
Message:
Source:
    Type:      NFS (an NFS mount that lasts the lifetime of a pod)
    Server:    10.43.115.208
    Path:      /nfs-default-claim/default-rook-nfs-pv-claim-pvc-ec2b3bbb-a5dd-4811-af90-38fef3a39a5e
    ReadOnly:  false
Events:        <none>

LongHorn

From our earlier steps, we see the /dev/sdc1 volume we could use on the master node

builder@anna-MacBookAir:~$ sudo fdisk -l | tail -n 10
Disk /dev/sdc: 29.3 GiB, 31457280000 bytes, 61440000 sectors
Disk model: USB DISK 2.0
Units: sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 512 bytes / 512 bytes
Disklabel type: dos
Disk identifier: 0x0c172cd2

Device     Boot Start      End  Sectors  Size Id Type
/dev/sdc1        2048 61439999 61437952 29.3G 83 Linux

I need to label it or at least get the UUID

builder@anna-MacBookAir:~$ sudo e2label /dev/sdc1 thumbdrive

builder@anna-MacBookAir:~$ sudo blkid | grep thumbdrive
/dev/sdc1: LABEL="thumbdrive" UUID="f1c4e426-8e9a-4dde-88fa-412abd73b73b" TYPE="ext4" PARTUUID="0c172cd2-01"

I can now create a mount and add to fstab

builder@anna-MacBookAir:~$ sudo mkdir /mnt/thumbdrive

builder@anna-MacBookAir:~$ cat /etc/fstab
# /etc/fstab: static file system information.
#
# Use 'blkid' to print the universally unique identifier for a
# device; this may be used with UUID= as a more robust way to name devices
# that works even if disks are added and removed. See fstab(5).
#
# <file system> <mount point>   <type>  <options>       <dump>  <pass>
# / was on /dev/sda2 during installation
UUID=9a0f9489-0e59-44b2-ae06-81abd1bd74a8 /               ext4    errors=remount-ro 0       1
# /boot/efi was on /dev/sda1 during installation
UUID=AF85-0536  /boot/efi       vfat    umask=0077      0       1
/swapfile                                 none            swap    sw              0       0
UUID=f1c4e426-8e9a-4dde-88fa-412abd73b73b       /mnt/thumbdrive ext4    defaults,errors=remount-ro 0       1

builder@anna-MacBookAir:~$ sudo mount -a

builder@anna-MacBookAir:~$ df -h | tail -n1
/dev/sdc1        29G   24K   28G   1% /mnt/thumbdrive

With some storage mounted, we can move onto the install.

Install Longhorn with Helm

builder@DESKTOP-QADGF36:~/Workspaces$ helm repo add longhorn https://charts.longhorn.io
"longhorn" already exists with the same configuration, skipping
builder@DESKTOP-QADGF36:~/Workspaces$ helm install longhorn longhorn/longhorn --namespace longhorn-system --create-namespace --set defaultSettings.defaultDataPath="/mnt/thumbdrive"
W0825 12:38:06.404367    7243 warnings.go:70] policy/v1beta1 PodSecurityPolicy is deprecated in v1.21+, unavailable in v1.25+
W0825 12:38:06.603931    7243 warnings.go:70] policy/v1beta1 PodSecurityPolicy is deprecated in v1.21+, unavailable in v1.25+
NAME: longhorn
LAST DEPLOYED: Thu Aug 25 12:38:05 2022
NAMESPACE: longhorn-system
STATUS: deployed
REVISION: 1
TEST SUITE: None
NOTES:
Longhorn is now installed on the cluster!

Please wait a few minutes for other Longhorn components such as CSI deployments, Engine Images, and Instance Managers to be initialized.

Visit our documentation at https://longhorn.io/docs/

We can get our service

$ kubectl get svc -n longhorn-system
NAME                          TYPE        CLUSTER-IP      EXTERNAL-IP   PORT(S)    AGE
longhorn-replica-manager      ClusterIP   None            <none>        <none>     168m
longhorn-engine-manager       ClusterIP   None            <none>        <none>     168m
longhorn-frontend             ClusterIP   10.43.169.210   <none>        80/TCP     168m
longhorn-admission-webhook    ClusterIP   10.43.139.79    <none>        9443/TCP   168m
longhorn-conversion-webhook   ClusterIP   10.43.171.232   <none>        9443/TCP   168m
longhorn-backend              ClusterIP   10.43.30.83     <none>        9500/TCP   168m

Create an ingress for Longhorn

$ cat longhorn-service.yaml
apiVersion: v1
kind: Service
metadata:
  name: longhorn-ingress-lb
  namespace: longhorn-system
spec:
  selector:
    app: longhorn-ui
  type: LoadBalancer
  ports:
    - name: http
      protocol: TCP
      port: 80
      targetPort: http

$ kubectl apply -f longhorn-service.yaml
service/longhorn-ingress-lb created

However, mine is stuck in pending since I cannot add more External IPs

One thing I realized when debugging was that I neglected to check the pre-requisites for Longhorn.

There is documentation on what you can run, but essentially jq and open-iscsi as well as nfs-common are expected to be present.

You can run a checker to see if you are missing any of these:

$ curl -sSfL https://raw.githubusercontent.com/longhorn/longhorn/master/scripts/environment_check.sh | bash
[INFO]  Required dependencies are installed.
[INFO]  Waiting for longhorn-environment-check pods to become ready (0/2)...
[INFO]  Waiting for longhorn-environment-check pods to become ready (0/2)...
[INFO]  Waiting for longhorn-environment-check pods to become ready (1/2)...
[INFO]  All longhorn-environment-check pods are ready (2/2).
[INFO]  Required packages are installed.
[INFO]  Cleaning up longhorn-environment-check pods...
[INFO]  Cleanup completed.

Once satisifed, i could see a new StorageClass listed

$ kubectl get sc
NAME                   PROVISIONER             RECLAIMPOLICY   VOLUMEBINDINGMODE      ALLOWVOLUMEEXPANSION   AGE
local-path (default)   rancher.io/local-path   Delete          WaitForFirstConsumer   false                  6h7m
longhorn (default)     driver.longhorn.io      Delete          Immediate              true                   13m

We can now check the UI. Note, it wont work to try and portforward the Pod, but the service will:

$ kubectl get svc -n longhorn-system
NAME                          TYPE           CLUSTER-IP      EXTERNAL-IP   PORT(S)        AGE
longhorn-replica-manager      ClusterIP      None            <none>        <none>         3h24m
longhorn-engine-manager       ClusterIP      None            <none>        <none>         3h24m
longhorn-frontend             ClusterIP      10.43.169.210   <none>        80/TCP         3h24m
longhorn-admission-webhook    ClusterIP      10.43.139.79    <none>        9443/TCP       3h24m
longhorn-conversion-webhook   ClusterIP      10.43.171.232   <none>        9443/TCP       3h24m
longhorn-backend              ClusterIP      10.43.30.83     <none>        9500/TCP       3h24m
longhorn-ingress-lb           LoadBalancer   10.43.251.55    <pending>     80:30328/TCP   34m
csi-attacher                  ClusterIP      10.43.253.8     <none>        12345/TCP      11m
csi-provisioner               ClusterIP      10.43.115.91    <none>        12345/TCP      11m
csi-resizer                   ClusterIP      10.43.247.155   <none>        12345/TCP      11m
csi-snapshotter               ClusterIP      10.43.43.138    <none>        12345/TCP      11m

$ kubectl port-forward -n longhorn-system svc/longhorn-ingress-lb 8080:80
Forwarding from 127.0.0.1:8080 -> 8000
Forwarding from [::1]:8080 -> 8000
Handling connection for 8080
Handling connection for 8080
Handling connection for 8080
Handling connection for 8080
Handling connection for 8080
Handling connection for 8080
Handling connection for 8080

/content/images/2022/09/longhornrook-01.png

While i haven’t solved why the mount fails for my master node (hint: i will in a moment), i could see the Volume for the fake mount I did on the worker node

/content/images/2022/09/longhornrook-02.png

At first, I saw errors in the Volume column. This is because I’m running just two nodes presently. However, we can set the requirement minimum by going to the settings page. Here I set it to 2.

/content/images/2022/09/longhornrook-03.png

Now in volumes I can create a volume manually

/content/images/2022/09/longhornrook-04.png

Here I will create a basic Volume for MongoDB

/content/images/2022/09/longhornrook-05.png

and we can see it listed in the UI

/content/images/2022/09/longhornrook-06.png

And I can see it created using kubectl as well

$ kubectl get volume --all-namespaces
NAMESPACE         NAME      STATE      ROBUSTNESS   SCHEDULED   SIZE          NODE   AGE
longhorn-system   mongodb   detached   unknown                  10737418240          66s

Testing

As before, Let’s create a PVC for MongoDB

$ cat pvc-long-block.yaml
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
  name: mongo-lh-pvc
spec:
  storageClassName: longhorn
  accessModes:
  - ReadWriteOnce
  resources:
    requests:
      storage: 5Gi
$ kubectl apply -f pvc-long-block.yaml
persistentvolumeclaim/mongo-lh-pvc created

And we can see it get created

$ kubectl get pvc
NAME           STATUS   VOLUME                                     CAPACITY   ACCESS MODES   STORAGECLASS   AGE
mongo-lh-pvc   Bound    pvc-8ec623a3-f92f-4a90-a9f6-5f069174a4a7   5Gi        RWO            longhorn       49s
$ kubectl get volume --all-namespaces
NAMESPACE         NAME                                       STATE      ROBUSTNESS   SCHEDULED   SIZE          NODE   AGE
longhorn-system   mongodb                                    detached   unknown                  10737418240          5m8s
longhorn-system   pvc-8ec623a3-f92f-4a90-a9f6-5f069174a4a7   detached   unknown                  5368709120           24s

We can then create a deployment to use it

$ cat mongo-use-pvc.yaml
apiVersion: apps/v1
kind: Deployment
metadata:
  name: mongo
spec:
  selector:
    matchLabels:
      app: mongo
  template:
    metadata:
      labels:
        app: mongo
    spec:
      containers:
      - image: mongo:latest
        name: mongo
        ports:
        - containerPort: 27017
          name: mongo
        volumeMounts:
        - name: mongo-persistent-storage
          mountPath: /data/db
      volumes:
      - name: mongo-persistent-storage
        persistentVolumeClaim:
          claimName: mongo-lh-pvc
---
apiVersion: v1
kind: Service
metadata:
  name: mongo
  labels:
    app: mongo
spec:
  selector:
    app: mongo
  type: NodePort
  ports:
    - port: 27017
      nodePort: 31017
$ kubectl apply -f mongo-use-pvc.yaml
deployment.apps/mongo created
service/mongo created

However, my mongo pod is failing because a node isnt live with longhorn

Events:
  Type     Reason              Age                       From                     Message
  ----     ------              ----                      ----                     -------
  Normal   Scheduled           117s                      default-scheduler        Successfully assigned default/mongo-c7546bfbf-sqcnz to anna-macbookair
  Warning  FailedMount         <invalid>                 kubelet                  Unable to attach or mount volumes: unmounted volumes=[mongo-persistent-storage], unattached volumes=[mongo-persistent-storage kube-api-access-44c4c]: timed out waiting for the condition
  Warning  FailedAttachVolume  <invalid> (x9 over 117s)  attachdetach-controller  AttachVolume.Attach failed for volume "pvc-8ec623a3-f92f-4a90-a9f6-5f069174a4a7" : rpc error: code = NotFound desc = node anna-macbookair not found

I suspected the thumb drive was failing (it was rather old). I yanked it from the laptop (.81) and everything immediately unblocked

$ kubectl get pods -n longhorn-system
NAME                                           READY   STATUS              RESTARTS       AGE
longhorn-conversion-webhook-78fc4df57c-2p7fb   1/1     Running             0              3h50m
longhorn-admission-webhook-8498c969d4-2tppj    1/1     Running             0              3h50m
longhorn-ui-58b8cc8d8b-r8lkk                   1/1     Running             0              3h50m
longhorn-conversion-webhook-78fc4df57c-qcn8n   1/1     Running             0              3h50m
longhorn-admission-webhook-8498c969d4-msv2l    1/1     Running             0              3h50m
longhorn-manager-vnsrd                         1/1     Running             42 (43m ago)   3h50m
longhorn-driver-deployer-855dfc74ff-t4bqs      1/1     Running             0              3h50m
csi-attacher-dcb85d774-l9rqp                   1/1     Running             0              37m
csi-attacher-dcb85d774-l4fnt                   1/1     Running             0              37m
csi-provisioner-5d8dd96b57-98b2s               1/1     Running             0              37m
csi-provisioner-5d8dd96b57-cjrph               1/1     Running             0              37m
csi-snapshotter-5586bc7c79-rg8nf               1/1     Running             0              37m
csi-snapshotter-5586bc7c79-5grmt               1/1     Running             0              37m
longhorn-csi-plugin-7j887                      2/2     Running             0              37m
csi-resizer-7c5bb5fd65-lb9qd                   1/1     Running             0              37m
csi-resizer-7c5bb5fd65-j4p29                   1/1     Running             0              37m
engine-image-ei-766a591b-vfmcs                 1/1     Running             0              37m
csi-resizer-7c5bb5fd65-h6qq5                   1/1     Running             0              37m
csi-attacher-dcb85d774-w2r2s                   1/1     Running             0              37m
longhorn-csi-plugin-z8l89                      2/2     Running             0              37m
csi-provisioner-5d8dd96b57-8lmqv               1/1     Running             0              37m
csi-snapshotter-5586bc7c79-8kmj8               1/1     Running             0              37m
engine-image-ei-766a591b-ckwqz                 1/1     Running             0              37m
instance-manager-e-664f6e39                    1/1     Running             0              37m
instance-manager-r-e6e13738                    1/1     Running             0              37m
instance-manager-e-fd761389                    0/1     ContainerCreating   0              27s
instance-manager-r-261f8b7d                    0/1     ContainerCreating   0              27s
longhorn-manager-9ts79                         1/1     Running             0              30s

and i could see files created

builder@anna-MacBookAir:~$ ls /mnt/thumbdrive/
longhorn-disk.cfg  replicas

The primary node showed up under Nodes

/content/images/2022/09/longhornrook-07.png

And then the Mongo Pod showed it had moved past the failing volume and was pulling the latest Mongo image

builder@DESKTOP-QADGF36:~$ kubectl describe pod mongo-c7546bfbf-sqcnz | tail -n10
Tolerations:                 node.kubernetes.io/not-ready:NoExecute op=Exists for 300s
                             node.kubernetes.io/unreachable:NoExecute op=Exists for 300s
Events:
  Type     Reason                  Age                  From                     Message
  ----     ------                  ----                 ----                     -------
  Normal   Scheduled               14m                  default-scheduler        Successfully assigned default/mongo-c7546bfbf-sqcnz to anna-macbookair
  Warning  FailedAttachVolume      2m3s (x14 over 14m)  attachdetach-controller  AttachVolume.Attach failed for volume "pvc-8ec623a3-f92f-4a90-a9f6-5f069174a4a7" : rpc error: code = NotFound desc = node anna-macbookair not found
  Warning  FailedMount             67s (x6 over 12m)    kubelet                  Unable to attach or mount volumes: unmounted volumes=[mongo-persistent-storage], unattached volumes=[mongo-persistent-storage kube-api-access-44c4c]: timed out waiting for the condition
  Normal   SuccessfulAttachVolume  1s                   attachdetach-controller  AttachVolume.Attach succeeded for volume "pvc-8ec623a3-f92f-4a90-a9f6-5f069174a4a7"
  Normal   Pulling                 <invalid>            kubelet                  Pulling image "mongo:latest"

I can test mongo using the pod.

First I’ll connect to the mongo pod

$ kubectl exec -it `kubectl get pod -l app=mongo --output=jsonpath={.items..metadata.name}` -- mongosh
Current Mongosh Log ID: 6308107a93adcf191eb8eacb
Connecting to:          mongodb://127.0.0.1:27017/?directConnection=true&serverSelectionTimeoutMS=2000&appName=mongosh+1.5.4
Using MongoDB:          6.0.1
Using Mongosh:          1.5.4

For mongosh info see: https://docs.mongodb.com/mongodb-shell/

------
   The server generated these startup warnings when booting
   2022-08-25T21:31:42.894+00:00: Using the XFS filesystem is strongly recommended with the WiredTiger storage engine. See http://dochub.mongodb.org/core/prodnotes-filesystem
   2022-08-25T21:31:43.789+00:00: Access control is not enabled for the database. Read and write access to data and configuration is unrestricted
   2022-08-25T21:31:43.789+00:00: vm.max_map_count is too low
------

------
   Enable MongoDB's free cloud-based monitoring service, which will then receive and display
   metrics about your deployment (disk utilization, CPU, operation statistics, etc).

   The monitoring data will be available on a MongoDB website with a unique URL accessible to you
   and anyone you share the URL with. MongoDB may use this information to make product
   improvements and to suggest MongoDB products and deployment options to you.

   To enable free monitoring, run the following command: db.enableFreeMonitoring()
   To permanently disable this reminder, run the following command: db.disableFreeMonitoring()
------

test>

I’ll insert an object then retrieve it

test> db.test.insertOne( {name: "test", number: 10  })
{
  acknowledged: true,
  insertedId: ObjectId("6308116bc275146b098036fb")
}
test> db.getCollection("test").find()
[
  {
    _id: ObjectId("6308116bc275146b098036fb"),
    name: 'test',
    number: 10
  }
]

To prove it is using the PVC, we’ll rotate the pod

$ kubectl get pods && kubectl delete pod `kubectl get pod -l app=mongo --output=jsonpath={.items..metadata.name}`
 && sleep 10 && kubectl get pods
NAME                    READY   STATUS    RESTARTS   AGE
mongo-c7546bfbf-sqcnz   1/1     Running   0          3h4m
pod "mongo-c7546bfbf-sqcnz" deleted
NAME                    READY   STATUS              RESTARTS   AGE
mongo-c7546bfbf-spq6l   0/1     ContainerCreating   0          11s

then exec into the new pod and see if our item is still there

$ kubectl get pods
NAME                    READY   STATUS    RESTARTS   AGE
mongo-c7546bfbf-spq6l   1/1     Running   0          20m
$ kubectl exec -it `kubectl get pod -l app=mongo --output=jsonpath={.items..metadata.name}` -- mongosh
Current Mongosh Log ID: 630816db997795658a1e0c0f
Connecting to:          mongodb://127.0.0.1:27017/?directConnection=true&serverSelectionTimeoutMS=2000&appName=mongosh+1.5.4
Using MongoDB:          6.0.1
Using Mongosh:          1.5.4

For mongosh info see: https://docs.mongodb.com/mongodb-shell/

------
   The server generated these startup warnings when booting
   2022-08-26T00:22:29.880+00:00: Using the XFS filesystem is strongly recommended with the WiredTiger storage engine. See http://dochub.mongodb.org/core/prodnotes-filesystem
   2022-08-26T00:22:36.903+00:00: Access control is not enabled for the database. Read and write access to data and configuration is unrestricted
   2022-08-26T00:22:36.903+00:00: vm.max_map_count is too low
------

------
   Enable MongoDB's free cloud-based monitoring service, which will then receive and display
   metrics about your deployment (disk utilization, CPU, operation statistics, etc).

   The monitoring data will be available on a MongoDB website with a unique URL accessible to you
   and anyone you share the URL with. MongoDB may use this information to make product
   improvements and to suggest MongoDB products and deployment options to you.

   To enable free monitoring, run the following command: db.enableFreeMonitoring()
   To permanently disable this reminder, run the following command: db.disableFreeMonitoring()
------

test> db.getCollection("test").find()
[
  {
    _id: ObjectId("6308116bc275146b098036fb"),
    name: 'test',
    number: 10
  }
]

In looking at the volume, we can see it’s in a degraded state (but mostly because it wants 3 replicas)

/content/images/2022/09/longhornrook-10.png

We can take a backup of any PVC by clicking “Take Snapshot”

/content/images/2022/09/longhornrook-11.png

and from there we can delete any Snapshots

/content/images/2022/09/longhornrook-12.png

We can send backups to NFS or S3 from Longhorn as well.

NFS Backups

I can see a suitable exported NFS share on my NAS I created in the past named ‘dockerBackup’

/content/images/2022/09/longhornrook-13.png

I can then add that into settings as a “Backup Target”

/content/images/2022/09/longhornrook-14.png

Sadly, this did not work. Even when testing locally on the master node

builder@DESKTOP-QADGF36:~$ sudo mount.nfs4 192.168.1.129:/volume1/dockerBackup /mnt/testingSassy -o nolock
builder@DESKTOP-QADGF36:~$ ls /mnt/testingSassy/
ls: cannot open directory '/mnt/testingSassy/': Permission denied
builder@DESKTOP-QADGF36:~$ sudo ls /mnt/testingSassy/
'#recycle'   myghrunner-1.1.11.tgz   myghrunner-1.1.12.tgz   myghrunner-1.1.13.tgz   myghrunner-1.1.9.tgz

Using in longhorn

/content/images/2022/09/longhornrook-16.png

Just gives failures

/content/images/2022/09/longhornrook-17.png

a file path fails

/content/images/2022/09/longhornrook-15.png

Backups to Azure Storage

Let’s create an Azure Storage account for backups

We could use the portal

/content/images/2022/09/longhornrook-18.png

Or the command line;

# Create the RG
$ az group create -n longhornstoragerg --location centralus
{
  "id": "/subscriptions/d955c0ba-13dc-44cf-a29a-8fed74cbb22d/resourceGroups/longhornstoragerg",
  "location": "centralus",
  "managedBy": null,
  "name": "longhornstoragerg",
  "properties": {
    "provisioningState": "Succeeded"
  },
  "tags": null,
  "type": "Microsoft.Resources/resourceGroups"
}

# Create a Standard LRS SA
$ az storage account create -n longhornstorage -g longhornstoragerg --sku Standard_LRS
{
  "accessTier": "Hot",
  "allowBlobPublicAccess": true,
  "allowCrossTenantReplication": null,
  "allowSharedKeyAccess": null,
  "azureFilesIdentityBasedAuthentication": null,
  "blobRestoreStatus": null,
  "creationTime": "2022-08-26T11:18:48.679940+00:00",
  "customDomain": null,
  "defaultToOAuthAuthentication": null,
  "enableHttpsTrafficOnly": true,
  "enableNfsV3": null,
  "encryption": {
    "encryptionIdentity": null,
    "keySource": "Microsoft.Storage",
    "keyVaultProperties": null,
    "requireInfrastructureEncryption": null,
    "services": {
      "blob": {
        "enabled": true,
        "keyType": "Account",
        "lastEnabledTime": "2022-08-26T11:18:48.789361+00:00"
      },
      "file": {
        "enabled": true,
        "keyType": "Account",
        "lastEnabledTime": "2022-08-26T11:18:48.789361+00:00"
      },
      "queue": null,
      "table": null
    }
  },
  "extendedLocation": null,
  "failoverInProgress": null,
  "geoReplicationStats": null,
  "id": "/subscriptions/d955c0ba-13dc-44cf-a29a-8fed74cbb22d/resourceGroups/longhornstoragerg/providers/Microsoft.Storage/storageAccounts/longhornstorage",
  "identity": null,
  "immutableStorageWithVersioning": null,
  "isHnsEnabled": null,
  "keyCreationTime": {
    "key1": "2022-08-26T11:18:48.789361+00:00",
    "key2": "2022-08-26T11:18:48.789361+00:00"
  },
  "keyPolicy": null,
  "kind": "StorageV2",
  "largeFileSharesState": null,
  "lastGeoFailoverTime": null,
  "location": "centralus",
  "minimumTlsVersion": "TLS1_0",
  "name": "longhornstorage",
  "networkRuleSet": {
    "bypass": "AzureServices",
    "defaultAction": "Allow",
    "ipRules": [],
    "resourceAccessRules": null,
    "virtualNetworkRules": []
  },
  "primaryEndpoints": {
    "blob": "https://longhornstorage.blob.core.windows.net/",
    "dfs": "https://longhornstorage.dfs.core.windows.net/",
    "file": "https://longhornstorage.file.core.windows.net/",
    "internetEndpoints": null,
    "microsoftEndpoints": null,
    "queue": "https://longhornstorage.queue.core.windows.net/",
    "table": "https://longhornstorage.table.core.windows.net/",
    "web": "https://longhornstorage.z19.web.core.windows.net/"
  },
  "primaryLocation": "centralus",
  "privateEndpointConnections": [],
  "provisioningState": "Succeeded",
  "publicNetworkAccess": null,
  "resourceGroup": "longhornstoragerg",
  "routingPreference": null,
  "sasPolicy": null,
  "secondaryEndpoints": null,
  "secondaryLocation": null,
  "sku": {
    "name": "Standard_LRS",
    "tier": "Standard"
  },
  "statusOfPrimary": "available",
  "statusOfSecondary": null,
  "tags": {},
  "type": "Microsoft.Storage/storageAccounts"
}

Now we need the access keys

$ az storage account keys list -g longhornstoragerg -n longhornstorage --output tsv
2022-08-26T11:18:48.789361+00:00        key1    FULL    KotiWIZpH+AKeyIRotatedAlready++Xvn5lp15YZ13kJ4spMepzemEaL6oGAuK+eH0CO7Y+AStL5Ding==
2022-08-26T11:18:48.789361+00:00        key2    FULL    nkiUA95pOYUQLAAJ4miG9WQQpuBbp+/AlsoRotated/StJidgNfbHJp/qPwtI3E7mOV643+AStJyohww==

We can see the Blob endpoint in our create output, but we can also just query for it

$ az storage account show -n longhornstorage -o json | jq -r '.primaryEndpoints.blob'
https://longhornstorage.blob.core.windows.net/

We can now use this in our S3 Proxy service

$ cat azStorageS3Proxy.yaml
apiVersion: v1
kind: Namespace
metadata:
  name: s3proxy
---
apiVersion: v1
kind: Service
metadata:
  name: s3proxy
  namespace: s3proxy
spec:
  selector:
    app: s3proxy
  ports:
  - protocol: TCP
    port: 80
    targetPort: 80
---
apiVersion: apps/v1
kind: Deployment
metadata:
  name: s3proxy
  namespace: s3proxy
spec:
  replicas: 1
  selector:
    matchLabels:
      app: s3proxy
  template:
    metadata:
      labels:
        app: s3proxy
    spec:
      containers:
      - name: s3proxy
        image: andrewgaul/s3proxy:latest
        imagePullPolicy: Always
        ports:
        - containerPort: 80
        env:
        - name: JCLOUDS_PROVIDER
          value: azureblob
        - name: JCLOUDS_IDENTITY
          value: longhornstorage
        - name: JCLOUDS_CREDENTIAL
          value: KotiWIZpH+AKeyIRotatedAlready++Xvn5lp15YZ13kJ4spMepzemEaL6oGAuK+eH0CO7Y+AStL5Ding==
        - name: S3PROXY_IDENTITY
          value: longhornstorage
        - name: S3PROXY_CREDENTIAL
          value: KotiWIZpH+AKeyIRotatedAlready++Xvn5lp15YZ13kJ4spMepzemEaL6oGAuK+eH0CO7Y+AStL5Ding==
        - name: JCLOUDS_ENDPOINT
          value: https://longhornstorage.blob.core.windows.net/
$ kubectl apply -f azStorageS3Proxy.yaml
namespace/s3proxy created
service/s3proxy created
deployment.apps/s3proxy created

*Quick Note: this method has an issue if your Account Key has a “/” in it. If that is the case, regenerate until you have one without a slash*

Testing

I’ll fire up an interactive ubuntu pod and install the AWS CLI

builder@DESKTOP-QADGF36:~/Workspaces/jekyll-blog$ kubectl run my-shell --rm -i --tty --image ubuntu -- bash
If you don't see a command prompt, try pressing enter.
root@my-shell:/# apt update
root@my-shell:/# apt install -y awscli

I’ll then want to create an AWS Profile and Config for this (the location really doenst matter)

root@my-shell:/# aws configure
AWS Access Key ID [None]: longhornstorage
AWS Secret Access Key [None]: KotiWIZpH+AKeyIRotatedAlready++Xvn5lp15YZ13kJ4spMepzemEaL6oGAuK+eH0CO7Y+AStL5Ding==
Default region name [None]: us-east-1
Default output format [None]: json

Now we can “make a bucket” which really makes a container

root@my-shell:/# aws --endpoint-url http://s3proxy.s3proxy.svc.cluster.local s3 mb s3://mybackups/
make_bucket: mybackups

/content/images/2022/09/longhornrook-19.png

Quick Note There are rules on Azure container (bucket) names. If you use casing, for instance, expect an error

root@my-shell:/# aws --endpoint-url http://s3proxy.s3proxy.svc.cluster.local s3 mb s3://myBackups
make_bucket failed: s3://myBackups An error occurred (400) when calling the CreateBucket operation: Bad Request

Let’s make a key for Longhorn:

$ echo KotiWIZpH+AKeyIRotatedAlready++Xvn5lp15YZ13kJ4spMepzemEaL6oGAuK+eH0CO7Y+AStL5Ding== | tr -d '\n' | base64 -w 0 && echo
S290aVdJWnBIK0swdG1oR1hMdWdpcnAzeDhGdlBwUVFzTysrWHZuNWxwMTVZWjEza0o0c3BNZXB6ZW1FYUw2b0dBdUsrZUgwQ083WStBU3RMNURpbmc9PQ==
$ echo http://s3proxy.s3proxy.svc.cluster.local | tr -d '\n' | base64 -w 0 && echo
aHR0cDovL3MzcHJveHkuczNwcm94eS5zdmMuY2x1c3Rlci5sb2NhbA==
$ echo longhornstorage | tr -d '\n' | base64 -w 0 && echo
bG9uZ2hvcm5zdG9yYWdl

$ cat longhornS3ProxySecret.yaml
apiVersion: v1
kind: Secret
metadata:
  name: s3proxy-secret
  namespace: longhorn-system
type: Opaque
data:
  AWS_ACCESS_KEY_ID: bG9uZ2hvcm5zdG9yYWdl # longhornstorage
  AWS_SECRET_ACCESS_KEY: S290aVdJWnBIK0swdG1oR1hMdWdpcnAzeDhGdlBwUVFzTysrWHZuNWxwMTVZWjEza0o0c3BNZXB6ZW1FYUw2b0dBdUsrZUgwQ083WStBU3RMNURpbmc9PQ== # KotiWIZpH+AKeyIRotatedAlready++Xvn5lp15YZ13kJ4spMepzemEaL6oGAuK+eH0CO7Y+AStL5Ding==
  AWS_ENDPOINTS: aHR0cDovL3MzcHJveHkuczNwcm94eS5zdmMuY2x1c3Rlci5sb2NhbA== # http://s3proxy.s3proxy.svc.cluster.local

$ kubectl apply -f longhornS3ProxySecret.yaml 
secret/s3proxy-secret created

In Longhorn Settings, let’s update the Backups to use this secret and endpoint

/content/images/2022/09/longhornrook-20.png

I can now go back and create a Backup of my PVC used for Mongo

/content/images/2022/09/longhornrook-21.png

I can add some labels

/content/images/2022/09/longhornrook-22.png

Then when I click create, I can see it progress as it copies out a backup

/content/images/2022/09/longhornrook-23.png

Once complete, I can mouse over to see details

/content/images/2022/09/longhornrook-24.png

Back in Azure, I can see the backup in the Blob container

/content/images/2022/09/longhornrook-25.png

We can see it really just copied the data - not the PVC 5Gb size

$ kubectl get pvc --all-namespaces
NAMESPACE   NAME           STATUS   VOLUME                                     CAPACITY   ACCESS MODES   STORAGECLASS   AGE
default     mongo-lh-pvc   Bound    pvc-8ec623a3-f92f-4a90-a9f6-5f069174a4a7   5Gi        RWO            longhorn       15h

/content/images/2022/09/longhornrook-26.png

A quick Note. If I wanted to save costs over time, i could make the default storage tier Cool

/content/images/2022/09/longhornrook-27.png

Summary

We looked into Rook way too much. I tried many permutations trying to get Rook Ceph to work. Clearly, rook.io as a CNCF supported project has a massive following. I just cannot get the thing to work with laptops running k3s. I even tried Azure Storage with AKS for a last ditch attempt. The one thing I did get to work was the NFS operator which they seem to have slated for gallows.

Longhorn, also a CNCF project, having come from Rancher (hence the name) serves the same purpose. It just worked. It worked great. If I needed another storage class locally, I think I would lean in to this one. The dashboard was great and being able to back it up to Azure Storage, albeit through an S3 proxy, solves my DR needs.

A Note on Art

As an aside, I had a lot of fun creating AI generated art (Midjourney) for this post. Here are the runner up images on prompts like ‘a knight on a horse with armor surrounded by long-horn bulls’

/content/images/2022/09/midjourney-knight-horse-longhorn-bull2.png

/content/images/2022/09/midjourney-knight-horse-longhorn-bull.png

and, of course, the one I used at the top

/content/images/2022/09/midjourney-knight-horse-longhorn-bull3.png

longhorn rook ceph kubernetes storage

Have something to add? Feedback? You can use the feedback form

Isaac Johnson

Isaac Johnson

Cloud Solutions Architect

Isaac is a CSA and DevOps engineer who focuses on cloud migrations and devops processes. He also is a dad to three wonderful daughters (hence the references to Princess King sprinkled throughout the blog).

Theme built by C.S. Rhymes