Manage Non-shared Storage with Stateful Sets

Objectives

  • Deploy applications that scale without sharing storage.

Application Clustering

Clustering applications, such as MySQL and Cassandra, typically require persistent storage to maintain the integrity of the data and files that the application uses. When many applications require persistent storage at the same time, multi-disk provisioning might not be possible due to the limited amount of available resources.

Shared storage solves this problem by allocating the same resources from a single device to multiple services.

Storage Services

File storage solutions provide the directory structure that is found in many environments. Using file storage is ideal when applications generate or consume reasonable volumes of organized data. Applications that use file-based implementations are prevalent, easy to manage, and provide an affordable storage solution.

File-based solutions are a good fit for data backup and archiving, due to their reliability, as are also file sharing and collaboration services. Most data centers provide file storage solutions, such as a network-attached storage (NAS) cluster, for these scenarios.

Network-attached storage (NAS) is a file-based storage architecture that makes stored data accessible to networked devices. NAS gives networks a single access point for storage with built-in security, management, and fault-tolerant capabilities. Out of the multiple data transfer protocols that networks can run, two protocols are fundamental to most networks: internet protocol (IP) and transmission control protocol (TCP).

The files that are transferred across these protocols can be formatted as one of the following protocols:

  • Network File Systems (NFS): This protocol enables remote hosts to mount file systems over a network and to interact with those file systems as though they are mounted locally.

  • Server Message Blocks (SMB): This protocol implements an application-layer network protocol that is used to access resources on a server, such as file shares and shared printers.

NAS solutions can provide file-based storage to applications within the same data center. This approach is common to many application architectures, including the following architectures:

  • Web server content

  • File share services

  • FTP storage

  • Backup archives

These applications take advantage of data reliability and the ease of file sharing that is available by using file storage. Additionally, for file storage data, the OS and file system handle the locking and caching of the files.

Although familiar and prevalent, file storage solutions are not ideal for all application scenarios. One particular pitfall of file storage is poor handling of large data sets or unstructured data.

Block storage solutions, such as Storage Area Network (SAN) and iSCSI technologies, provide access to raw block devices for application storage. These block devices function as independent storage volumes, such as the physical drives in servers, and typically require formatting and mounting for application access.

Using block storage is ideal when applications require faster access for optimizing computationally heavy data workloads. Applications that use block-level storage implementations gain efficiencies by communicating at the raw device level, instead of relying on operating system layer access.

Block-level approaches enable data distribution on blocks across the storage volume. Blocks also use basic metadata, including a unique identification number for each block of data, for quick retrieval and reassembly of blocks for reading.

SAN and iSCSI technologies provide applications with block-level volumes from network-based storage pools. Using block-level access to storage volumes is common for application architectures, including the following architectures:

  • SQL Databases (single node access).

  • Virtual Machines (multinode access).

  • High-performance data access.

  • Server-side processing applications.

  • Multiple block device RAID configurations.

Application storage that uses several block devices in a RAID configuration benefits from the data integrity and performance that the various arrays provide.

With Red Hat OpenShift Container Platform (RHOCP), you can create customized storage classes for your applications. With the NAS and the SAN storage technologies, RHOCP applications can use either the NFS protocol for file-based storage, or the block-level protocol for block storage.

Introduction to Stateful Sets

A stateful application is characterized by acting according to past states or transactions, which affect the current state and future ones of the application. Using a stateful application simplifies recovery from failures by starting from a certain point in time.

A stateful set is the representation of a set of pods with consistent identities. These identities are defined as a network with a single stable DNS, hostname, and storage from as many volume claims as the stateful set specifies. A stateful set guarantees that a given network identity maps to the same storage identity.

Deployments represent a set of containers within a pod. Each deployment can have many active replicas, depending on the user specification. These replicas can be scaled up or down, as needed. A replica set is a native Kubernetes API object that ensures that the specified number of pod replicas are running. Deployments are used for stateless applications by default, and they can be used for stateful application by attaching a persistent volume. All pods in a deployment share a volume and PVC.

In contrast with deployments, stateful set pods do not share a persistent volume. Instead, stateful set pods each have their own unique persistent volumes. Pods are created without a replica set, and each replica records its own transactions. Each replica has its own identifier, which is maintained in any rescheduling. You must configure application-level clustering so that stateful set pods have the same data.

Stateful sets are the best option for applications, such as databases, that require consistent identities and non-shared persistent storage.

Working with Stateful Sets

With Kubernetes, you can use manifest files to specify the intended configuration of a stateful set. You can define the name of the application, labels, the image source, storage, environment variables, and more.

The following snippet shows an example of a YAML manifest file for a stateful set:

apiVersion: apps/v1
kind: StatefulSet
metadata:
  name: dbserver 1
spec:
  selector:
    matchLabels:
      app: database 2
  replicas: 3 3
  template:
    metadata:
      labels:
        app: database 4
    spec:
      containers:
      - env: 5
        - name: MYSQL_USER
          valueFrom:
            secretKeyRef:
              key: user
              name: sakila-cred
        image: registry.ocp4.example.com:8443/redhattraining/mysql-app:v1 6
        name: database 7
        ports: 8
        - containerPort: 3306
          name: database
        volumeMounts: 9
        - mountPath: /var/lib/mysql
          name: data
      terminationGracePeriodSeconds: 10
  volumeClaimTemplates:
  - metadata:
      name: data
    spec:
      accessModes: [ "ReadWriteOnce" ] 10
      storageClassName: "lvms-vg1" 11
      resources:
        requests:
          storage: 1Gi 12

1

Name of the stateful set.

2 4

Application labels.

3

Number of replicas.

5

Environment variables, which can be explicitly defined, or by using a secret object.

6

Image source.

7

Container name.

8

Container ports.

9

Mount path information for the persistent volumes for each replica. Each persistent volume has the same configuration.

10

The access mode of the persistent volume. You can choose between the ReadWriteOnce, ReadWriteMany, and ReadOnlyMany options.

11

The storage class that the persistent volume uses.

12

Size of the persistent volume.

Note

Stateful sets can be created only by using manifest files. The oc and kubectl CLI do not have commands to create stateful sets imperatively.

Create the stateful set by using the create command:

[user@host ~]$ oc create -f statefulset-dbserver.yml

Verify the creation of the stateful set named dbserver:

[user@host ~]$ kubectl get statefulset
NAME       READY   AGE
dbserver   3/3     6s

Verify the status of the instances:

[user@host ~]$ oc get pods
NAME         READY   STATUS    RESTARTS   AGE
dbserver-0   1/1     Running   0          85s
dbserver-1   1/1     Running   0          82s
dbserver-2   1/1     Running   0          79s

Verify the status of the persistent volumes:

[user@host ~]$ kubectl get pvc
NAME              STATUS   VOLUME             CAPACITY   ACCESS MODES   STORAGECLASS ...
data-dbserver-0   Bound    pvc-c28f61ee-...   1Gi        RWO            nfs-storage  ...
data-dbserver-1   Bound    pvc-ddbe6af1-...   1Gi        RWO            nfs-storage  ...
data-dbserver-2   Bound    pvc-8302924a-...   1Gi        RWO            nfs-storage  ...

Notice that three PVCs were created. Confirm that persistent volumes are attached to each instance:

[user@host ~]$ oc describe pod dbserver-0
...output omitted...
Volumes:
  data:
    Type:       PersistentVolumeClaim (a reference to a PersistentVolumeClaim in the same namespace)
    ClaimName:  data-dbserver-0
...output omitted...
[user@host ~]$ oc describe pod dbserver-1
...output omitted...
Volumes:
  data:
    Type:       PersistentVolumeClaim (a reference to a PersistentVolumeClaim in the same namespace)
    ClaimName:  data-dbserver-1
...output omitted...
[user@host ~]$ oc describe pod dbserver-2
...output omitted...
Volumes:
  data:
    Type:       PersistentVolumeClaim (a reference to a PersistentVolumeClaim in the same namespace)
    ClaimName:  data-dbserver-2
...output omitted...

Note

You must configure application-level clustering for stateful set pods to have the same data.

You can update the number of replicas of the stateful set by using the scale command:

[user@host ~]$ oc scale statefulset/dbserver --replicas 1
NAME         READY   STATUS    RESTARTS  ...
dbserver-0   1/1     Running   0         ...

To delete the stateful set, use the delete statefulset command:

[user@host ~]$ kubectl delete statefulset dbserver
statefulset.apps "dbserver" deleted

Notice that the PVCs are not deleted after the execution of the oc delete statefulset command:

[user@host ~]$ oc get pvc
NAME              STATUS   VOLUME             CAPACITY   ACCESS MODES   STORAGECLASS ...
data-dbserver-0   Bound    pvc-c28f61ee-...   1Gi        RWO            nfs-storage  ...
data-dbserver-1   Bound    pvc-ddbe6af1-...   1Gi        RWO            nfs-storage  ...
data-dbserver-2   Bound    pvc-8302924a-...   1Gi        RWO            nfs-storage  ...

You can create a stateful set from the web console by clicking the WorkloadsStatefulSets menu. Click Create StatefulSet and customize the YAML manifest.

References

Kubernetes Documentation - StatefulSets

For more information, refer to the What Is Network-Attached Storage? section in the Understanding Data Storage chapter at https://www.redhat.com/en/topics/data-storage/network-attached-storage#how-does-it-work