Kubernetes has become the de facto standard for container orchestration, enabling developers to deploy, scale, and manage containerized applications with ease.
While Deployments are commonly used for stateless applications, StatefulSets are designed for stateful applications that require stable network identities, persistent storage, and ordered deployment and scaling.
In this article, we’ll explore what StatefulSets are, their key components, and how they work with a practical example. You can click this link to learn more about Kubernetes.
What is a StatefulSet?
A StatefulSet is a Kubernetes workload API object used to manage stateful applications. Unlike Deployments, which are ideal for stateless applications, StatefulSets provide guarantees about the ordering and uniqueness of Pods. This is crucial for applications like databases (e.g., MySQL, PostgreSQL, Cassandra) or distributed systems (e.g., Kafka, Zookeeper) that require:
1. Stable, unique network identifiers: Each Pod gets a persistent hostname that remains the same even after rescheduling.
2. Persistent storage: Each Pod gets its own dedicated storage that persists across restarts.
3. Ordered deployment and scaling: Pods are created, updated, and deleted in a specific order.
Key Components that makeup a StatefulSet
Pods within a StatefulSet are assigned unique, stable network identities that are critical for stateful applications. Each Pod is given a predictable hostname and DNS entry, following the naming convention <statefulset-name>-<ordinal-index>. For instance, a StatefulSet named web will generate Pods named web-0, web-1, web-2, and so on. This consistent naming scheme ensures that each Pod retains its identity over time, even if it is rescheduled, which is essential for maintaining the integrity and connectivity of stateful applications.
StatefulSets leverage Persistent Volume Claims (PVCs) to provide each Pod with dedicated persistent storage. These PVCs can be created either dynamically or statically, and are associated with individual Pods based on their ordinal index. For example, in a StatefulSet, the Pod named web-0 would typically use a PVC called data-web-0, while web-1 would use data-web-1, and so forth. This methodical assignment ensures that each Pod not only has its own unique storage resource but also maintains data consistency and persistence across Pod restarts or rescheduling, which is crucial for stateful applications.
StatefulSets enforce an ordered deployment and scaling process where Pods are created and deleted in a strict sequence, starting with web-0, then web-1, and continuing in that order. This sequential order is essential for applications that require a specific startup sequence, such as primary-replica databases, ensuring that all dependencies and roles are properly initialized. Additionally, when scaling down a StatefulSet, the Pods are removed in the reverse order, thereby maintaining the correct operational order and reducing the risk of disruption in services that depend on these precise relationships.
A StatefulSet requires a Headless Service to manage the network identities of its Pods effectively. Unlike a standard service that allocates a cluster IP, a Headless Service does not provide a single IP address; instead, it offers DNS entries for each individual Pod. This allows for direct communication with each Pod based on its stable network identity, enabling applications to discover and interact with specific instances without the need for a load balancer.
Practical Example: Deploying a Stateful MySQL Cluster
Let’s deploy a simple MySQL cluster using a StatefulSet to understand how it works in practice.
Step 1: Create a Headless Service
A Headless Service is required to provide stable network identities for the Pods. Here’s an example YAML file:
apiVersion: v1
kind: Service
metadata:
name: mysql
labels:
app: mysql
spec:
ports:
– port: 3306
name: mysql
clusterIP: None
selector:
app: mysql
– clusterIP: None makes this a Headless Service.
– The Service will create DNS entries for each Pod (e.g., `mysql-0.mysql`, `mysql-1.mysql`).
Step 2: Create a StatefulSet
Now, let’s define the StatefulSet for the MySQL cluster:
apiVersion: apps/v1
kind: StatefulSet
metadata:
name: mysql
spec:
serviceName: mysql
replicas: 3
selector:
matchLabels:
app: mysql
template:
metadata:
labels:
app: mysql
spec:
containers:
– name: mysql
image: mysql:5.7
env:
– name: MYSQL_ROOT_PASSWORD
value: “password”
ports:
– containerPort: 3306
name: mysql
volumeMounts:
– name: mysql-data
mountPath: /var/lib/mysql
volumeClaimTemplates:
– metadata:
name: mysql-data
spec:
accessModes: [ “ReadWriteOnce” ]
resources:
requests:
storage: 10Gi
Things to note:
– serviceName: Matches the name of the Headless Service.
– replicas: Defines the number of Pods (3 in this case).
– volumeClaimTemplates: Creates a Persistent Volume Claim (PVC) for each Pod. Each Pod will get its own storage (e.g., `mysql-data-mysql-0`, `mysql-data-mysql-1`, etc.).
– ordered creation: Pods will be created in order (`mysql-0`, `mysql-1`, `mysql-2`).
Step 3: Deploy and Verify
1. Apply the YAML files:
kubectl apply -f headless-service.yaml
kubectl apply -f statefulset.yaml
2. Check the Pods:
kubectl get pods -l app=mysql
3. Check the Persistent Volume Claims:
kubectl get pvc
4. Test DNS resolution:
kubectl run -it –rm –image=busybox –restart=Never test — nslookup mysql-0.mysql
When to Use StatefulSets
StatefulSets are ideal for applications that require stable, unique network identifiers, persistent storage tied to individual Pods, and ordered deployment, scaling, and updates. This makes them particularly well-suited for systems such as databases (MySQL, PostgreSQL, Cassandra), distributed systems (Kafka, Zookeeper, Elasticsearch), and any application that depends on stable hostnames or persistent storage. By ensuring that each Pod maintains a consistent identity and storage association, StatefulSets facilitate reliable and predictable behavior for stateful applications in dynamic environments. Some of the use cases for Statefulsets are:
– Databases (MySQL, PostgreSQL, Cassandra).
– Distributed systems (Kafka, Zookeeper, Elasticsearch).
– Applications requiring stable hostnames or storage.
Conclusion
StatefulSets are a powerful tool in Kubernetes for managing stateful applications. They provide stable network identities, persistent storage, and ordered operations, making them ideal for databases and distributed systems. By following the example above, you can deploy your own stateful applications with confidence.
Whether you’re running a small MySQL cluster or a large-scale distributed system, StatefulSets ensure your applications run reliably and predictably in a Kubernetes environment.