Introduction

This is the sixth part of the Kubernetes blog series; if you missed the last blog, here is the link for PART- 5. This blog post explores Kubernetes’s various strategies to manage stateful applications, highlighting the importance of maintaining data integrity across pod lifecycle changes, modifications, and deployments. Applications that need to maintain data across pod lifetime changes, modifications, and deployments need consistent, reliable storage solutions. This blog post covers some of the ways the Kubernetes architecture provides for you to manage stateful applications, paying special attention to StatefulSets, Persistent Volumes (PVs), and Persistent Volume Claims (PVCs). This post will also take you through enabling Amazon Elastic Kubernetes Service (EKS) with Elastic Block Store (EBS) volumes for new data management capabilities.

Detailed Overview of Kubernetes StatefulSets

StatefulSets become relevant in handling applications that require consistent storage, such as databases which is why it is an important aspect of the Kubernetes architecture. For instance, StatefulSets supports a unique identity for each pod, such as an identifying constant across rescheduling. This is important for applications that need initialization and storage. For example, when deploying a MongoDB database, a StatefulSet will ensure that each replica can have an identity and its own state if it must be recreated on a different node.

StatefulSets also manages the deployment and scaling of a set of pods, providing guarantees about the ordering and uniqueness of these pods.  For instance, the following is a simple StatefulSet definition as an example of MongoDB.

apiVersion: apps/v1
kind: StatefulSet
metadata:
  name: mongo
spec:
  selector:
    matchLabels:
      app: mongo
  serviceName: "mongo"
  replicas: 3
  template:
    metadata:
      labels:
        app: mongo
    spec:
      containers:
        - name: mongo
          image: mongo
          ports:
            - containerPort: 27017

This will set up three stable MongoDB replicas mapped to a stable network identity and storage.

Persistent Volumes (PVs) and Persistent Volume Claims (PVCs)

Persistent Volumes (PVs) and Persistent Volume Claims (PVCs) represent the abstract, common storage concept at a high level. It is a Kubernetes resource that radically changes the way one could think about storage and data management across a cluster, independent of the lifecycle of individual Pods. Here is a closer look at each of them.

Persistent Volumes (PVs)

A Persistent Volume refers to a piece of storage in the cluster. Generally, an administrator provisions the Persistent Volume. On the other hand, it can also be provisioned dynamically with the help of storage classes. In fact, PVs are volume plugins just like Volumes. However, their lifecycle is independent of any individual Pod that utilises them. This independence from the pod lifecycle makes the PV especially suitable for cases where the data has to persist, for example, when using it for databases or file storage. In case a pod crashes or stops, it should not get lost. 

PVs are designed to provide multi-user environments with a more seamless and flexible solution for storage. They can be provisioned with specific storage capacities and access modes, which are:

  • ReadWriteOnce – You can mount the volume as read-write by a single node.
  • ReadWriteMany – This indicates you can mount the volume as read-write by many nodes.
  • ReadOnlyMany – You can mount the volume as read-only by several nodes. 

Additionally, PVs support various storage backends, such as network disks, solid-state drives, or cloud storage, enabling broad integration with different infrastructures.

Persistent Volume Claims (PVCs)

Another important part of the Kubernetes architecture is the Persistent Volume Claim (PVC) acts in the same manner as a pod; while the user requests a quantity of storage, PVCs consume PV resources just like pods consume node resources. PVCs allow users to request size and optionally request access modes and other performance characteristics. This exposes actual storage details opaque to its consumers in Pods and Services, enabling portability and scaling.

When a user creates a PVC, it checks if a PVC exists in the cluster that can match the required claim by looking at some important parameters, such as access modes, storage size, and others. If a matching PV is found, the PV will be bound to the PVC and made available to be consumed by the pod. In any case, where no matching PV exists, dynamic provisioning is attempted, and the cluster creates a PV dynamically in case provisioning is allowed. The PV is created to match the claim requirements.

How PVs and PVCs Work Together

The relationship between PV and PVC is, to some extent, similar to nodes and pods. In the same way that the pods consume a node’s resources, PVCs consume a PV’s resources. A typical workflow goes like this:

  1. Provisioning: An administrator provisions a cluster with some PVs of various sizes and configurations.
  2. Requesting Storage: Users needing persistent storage create a PVC, specifying the desired size and other characteristics.
  3. Binding: Kubernetes looks for a PV in such a way that it matches the requirements of the PVC and later binds it. Once bound, the PVC is attached to the pod based on its specifications throughout its lifecycle.
  4. Using Storage: The pod utilizes the PV as needed, and the data remains persistent even if the pod is restarted or deleted.
  5. Releasing: The PVC is released when the pod no longer requires persistent storage. Depending on the reclaim policy, the PV may either be deleted or made available for binding to another PVC.

Example

Consider a scenario where an application has to store huge datasets or frequently carry out read/write operations. PVs will allow the data to outlive the restart of pods and failures to ensure the application’s robustness and rich data. The example shows a PVC requesting a specific size of storage:

apiVersion: v1
kind: PersistentVolumeClaim
metadata:
  name: mysql-pvc
spec:
  accessModes:
    - ReadWriteOnce
  resources:
    requests:
      storage: 10Gi

This PVC can be consumed by the MySQL database deployed in the Kubernetes cluster, which requests storage of 10 GiB and is accessible in both read and write modes by the single pod.

Enhanced Data Management with EBS Volumes in Amazon EKS

EBS volumes in Amazon EKS provide a durable and scalable storage solution. EBS volumes can be provisioned dynamically as PVs through the AWS EBS CSI Driver to enable integration in Kubernetes. This will make it highly easier to add storage scaling as per the application requirements.

If an EBS volume is to be described as a persistent volume, we will describe a storage class something like this:

kind: StorageClass
apiVersion: storage.k8s.io/v1
metadata:
  name: ebs-sc
provisioner: ebs.csi.aws.com
parameters:
  type: gp2
reclaimPolicy: Retain
allowVolumeExpansion: True

This StorageClass automatically provisions EBS volumes when a PVC requests it, seamlessly integrating with Kubernetes deployments.

Conclusion

Application lifecycle management in Kubernetes for stateful applications is inherently a little complex, but with the right tooling, it is made feasible. StatefulSets support the deployment of services requiring stable storage and help with the consistent deployment of services. Persistent volumes and claims both abstract away the storage, providing robustness to the application from disruptions. The support of EBS volumes with Amazon EKS will further provide users with a more scalable and reliable storage feature to guarantee performance and reliability in running stateful applications in Kubernetes environment. Most importantly, that makes Kubernetes an attractive proposition for organisations seeking to balance their investment in optimising their application infrastructure between resilience and scalability.

To learn more about cloud technology, cloud computing, Kubernetes architecture, Kubernetes for stateful applications, storage solutions, and its related concepts, visit the CloudZenia website. 


Frequently Asked Questions (FAQs)

1. What is a StatefulSet in Kubernetes? How does it differ from a Deployment?

A StatefulSet in Kubernetes is nothing but a workload API object. Its purpose is to manage stateful applications. StatefulSets are similar to Deployments, but, for example, StatefulSets maintain the identity of their pods and their storage volumes across pod re-scheduling and restarts. This makes them particularly valuable for applications like databases.

2. How do Persistent Volumes (PVs) and Persistent Volume Claims (PVCs) actually work in Kubernetes?

PVs are the storage resources in a Kubernetes cluster that exist independently from the pod lifecycle. PVCs, on the other hand, are requests for storage by a Pod, specifying size and access modes, among other parameters. PVs satisfy these requests and bind to PVCs based on compatibility.

3. What are some common use cases for using EBS volumes with Amazon EKS?

EBS volumes are usually used for workloads that require high-performance persistent storage, such as databases, content management systems, and file storage. This will make the EBS volume durable and scalable, which can be used for production environments under EKS.

4. How do you scale stateful applications in Kubernetes?

Stateful application scaling includes increasing the number of replicas in a StatefulSet. However, since each pod of a StatefulSet has an identity appended with storage, scaling needs to be done very carefully so that both data integrity and application performance are maintained well.

Troubleshooting Guide

Problem: StatefulSet pods are not starting.

Solution: Check that the associated Persistent Volume Claims are configured correctly and enough available Persistent Volumes exist. Examine the StatefulSet events for scheduling errors with the command:

kubectl describe statefulset
<
name
>
.

Problem: Persistent Volume Claims are stuck in a ‘Pending’ state.

Solution: This often happens when no PV is available to satisfy the claim’s requirement or the dynamic provisioner (if configured) is unable to create a new volume. Please check the storage class details of your cluster and the available PVs.

Problem: When you scale down and subsequently scale up a StatefulSet, it leads to data inconsistency.

Solution: Before scaling down, ensure your application gracefully shuts down the pod, i.e., replicating data or having the backup mechanisms in place. Use preStop hooks in your pod specification to manage the shutdown operations properly.

Problem: EBS volumes are not attaching to pods.

Solution: This may be because the EBS volume resides in an AWS Zone other than the one hosting your EKS nodes. Ensure the node groups in your EKS cluster are placed in the same region and zone as your EBS volumes. Also, ensure that the EKS nodes are given the respective permissions and roles, allowing them to access the EBS volumes.

Aug 23, 2024