Quick Links
StatefulSetsare Kubernetes objects used to consistently deploy stateful application components. Pods created as part of a StatefulSet are given persistent identifiers that they retain even when they’re rescheduled.
A StatefulSet can deploy applications that need to reliably identify specific replicas, rollout updates in a pre-defined order, or stably access storage volumes. They’re applicable to many different use cases but are most commonly used for databases and other types of persistent data store.
In this article you’ll learn what StatefulSets are, how they work, and when you should use them. We’ll also cover their limitations and the situations where other Kubernetes objects are a better choice.
What Are StatefulSets?
Making Pods part of a StatefulSet instructs Kubernetes to schedule and scale them in a guaranteed manner. Each Pod gets allocated a unique identity which any replacement Pods retain.
The Pod name is suffixed with an ordinal index that defines its order during scheduling operations. A StatefulSet called
containing three replicas will create the following named Pods:
Pods use their names as their hostname so other services that need to reliably access the second replica of the StatefulSet can connect tomysql-2. Even if the specific Pod that runsmysql-2gets rescheduled later on, its identity will pass to its replacement.
StatefulSets also enforce that Pods are removed in reverse order of their creation. If the StatefulSet is scaled down to one replica,mysql-3is guaranteed to exit first, followed bymysql-2. This behaviordoesn’t applywhen the entire StatefulSet is deleted and can be disabled by setting a StatefulSet’s
field to
.
StatefulSet Use Cases
StatefulSets are normally used to run replicated applications where individual Pods have different roles. As an example, you could be deploying a MySQL database with a primary instance and two read-only replicas. A regular ReplicaSet or Deployment would not be appropriate because you couldn’t reliably identify the Pod running the primary replica.
StatefulSets address this by guaranteeing that each Pod in the ReplicaSet maintains its identity. Your other services can reliably connect tomysql-1to interact with the primary replica. ReplicaSets also enforce that new Pods are only started when the previous Pod is running. This ensures the read-only replicas get created after the primary is up and ready to expose its data.
The purpose of StatefulSets is to accommodate non-interchangeable replicas inside Kubernetes. Whereas Pods in a stateless application are equivalent to each other, stateful workloads require an intentional approach to rollouts, scaling, and termination.
StatefulSets integrate withlocal persistent volumesto support persistent storage that sticks to each replica. Each Pod gets access toits own volumethat will be automatically reattached when the replica’s rescheduled to another node.
Creating a StatefulSet
Here’s an example YAML manifest that defines a stateful set for running MySQL with a primary node and two replicas:
kind: Service
name: mysql
app: mysql
- name: mysql
port: 3306
clusterIP: None
apiVersion: apps/v1
kind: StatefulSet
serviceName: mysql
replicas: 3
- name: mysql-init
image: mysql:8.0
bash
“-c”
|
set -ex
[[ hostname
=~ -([0-9]+)$ ]] || exit 1
ordinal=${BASH_REMATCH[1]}
echo [mysqld] > /mnt/conf/server-id.cnf
MySQL doesn’t allow “0” as a server-id
so we have to add 1 to the Pod’s index
echo server-id=$((1 + $ordinal)) » /mnt/conf/server-id.cnf
if [[ $ordinal -eq 0 ]]; then
printf “[mysqld]\nlog-bin” > /mnt/conf/primary.cnf
else
printf “[mysqld]\nsuper-read-only” /mnt/conf/replica.cnf
fi
- name: config
mountPath: /mnt/conf
- name: MYSQL_ALLOW_EMPTY_PASSWORD
value: “1”
containerPort: 3306
mountPath: /etc/mysql/conf.d
- name: data
mountPath: /var/lib/mysql
subPath: mysql
command: [“mysqladmin”, “ping”]
initialDelaySeconds: 30
periodSeconds: 5
timeoutSeconds: 5
command: [“mysql”, “-h”, “127.0.0.1”, “-e”, “SELECT 1”]
initialDelaySeconds: 5
timeoutSeconds: 1
emptyDir: {}
name: data
accessModes: [“ReadWriteOnce”]
requests:
storage: 1Gi
This is quite a long manifest so lets unpack what happens.
An ordinary Deployment or ReplicaSet could not implement this workflow. Once your Pods have started, you can scale the StatefulSet up or down without risking the destruction of the MySQL primary node. Kubernetes provides a guarantee that the established Pod order will be respected.
$ kubectl apply -f mysql-statefulset.yaml
Scale up to 5 Pods - a MySQL primary and 4 MySQL replicas
$ kubectl scale statefulset mysql –replicas=5
Rolling Updates
StatefulSets implement rolling updates when you change their specification. The StatefulSet controller will replace each Pod in sequential reverse order, using the persistently assigned ordinal indexes.mysql-3will be deleted and replaced first, followed bymysql-2andmysql-1.mysql-2won’t get updated until the newmysql-3Pod transitions to theRunningstate.
The rolling update mechanism includes support for staged deployments too. Setting the.spec.updateStrategy.rollingUpdate.partitionfield in your StatefulSet’s manifest instructs Kubernetes to only update the Pods with an ordinal index greater than or equal to the given partition.
partition: 1
volumeClaimTemplates:
…
In this example only Pods indexed1or higher will be targeted by update operations. The first Pod in the StatefulSet won’t receive a new specification until the partition is lowered or removed.
Limitations
StatefulSets havesome limitationsyou should be aware of before you adopt them. These common gotchas can trip you up when you start deploying stateful applications.
StatefulSets alsoomit a mechanismfor resizing the volumes linked to each Pod. You have to manually edit each persistent volume and its corresponding persistent volume claim, then delete the StatefulSet and orphan its Pods. Creating a new StatefulSet with the revised specification will allow Kubernetes to reclaim the orphaned Pods and resize the volumes.
When Not To Use a StatefulSet
You should only use a StatefulSet when individual replicas have their own state. A StatefulSet isn’t necessary when all the replicas share the same state, even if it’s persistent.
In these situations you can use a regularReplicaSet or Deploymentto launch your Pods. Any mounted volumes will be shared across all of the Pods which is the expected behavior for stateless systems.
A StatefulSet doesn’t add value unless you need individual persistent storage or sticky replica identifiers. Using a StatefulSet incorrectly can cause confusion by suggesting Pods are stateful when they’re actually running a stateless workload.
Summary
StatefulSets provide persistent identities for replicated Kubernetes Pods. Each Pod is named with an ordinal index that’s allocated sequentially. When the Pod gets rescheduled, its replacement inherits its identity. The StatefulSet also ensures that Pods get terminated in the reverse order they were created in.
StatefulSets allow Kubernetes to accommodate applications that require graceful rolling deployments, stable network identifiers, and reliable access to persistent storage. They’re suitable for any situation where the replicas in a set of Pods have their own state that needs to be preserved.
A StatefulSet doesn’t need to be used if your replicas are stateless, even if they’re storing some persistent data. Deployments and ReplicaSets are more suitable when individual replicas don’t need to be identified or scaled in a consistent order.