Table of Contents
Kubernetes has many advantages; among them is the ability to easily create and delete workloads as containers. When using stateful applications, care must be taken when handling data. Pods created by Kubernetes have readable and writable disk space inside the Pod, but deleting a Pod also deletes this disk space. For Pods that collect databases and logs, it is inconvenient if the disk is deleted at the same time as the Pod. So data persistence—a mechanism that keeps data even after the Pod is deleted—is required.
Kubernetes uses a highly abstract storage model for retaining data, allowing users to allocate and use volumes for containers in Pods without knowing the storage details. If you need to store data within Kubernetes, chances are you will be using persistent volumes. PersistentVolume
(PV for short) is part of the network storage within a cluster provided by the administrator. Just like the nodes in a cluster, PV is a resource in the cluster. The PV is an API object that captures the implementation details of a system, such as NFS, iSCSI, or other cloud storage systems.
In this article, you will learn more about what persistent volumes are and how best to use them.
What are Persistent Volumes (PVs)?
PV is the way to define the storage data, such as storage classes or storage implementations. Unlike ordinary volumes, PV is a resource object in a Kubernetes cluster; creating a PV is equivalent to creating a storage resource object. To use this resource, it must be requested through persistent volume claims (PVC). A PVC volume is a request for storage, which is used to mount a PV into a Pod. The cluster administrator can map different classes to different service levels and different backend policies.
Persistent storage volume can be carried out through the YAML configuration file and specify which plugin type to use. The following is a YAML configuration file for persistent storage volume. This configuration file requires 5Gi of storage space to be provided. The storage mode is Filesystem
, the access mode is ReadWriteOnce
, and the persistent storage volume is recycled through the Recycle
recycling policy. Finally, the storage class is specified as slow
, and the NFS plug-in type is used
.
apiVersion: v1 kind: PersistentVolume metadata: name: pv0005 spec: capacity: storage: 5Gi volumeMode: Filesystem accessModes: - ReadWriteOnce persistentVolumeReclaimPolicy: Recycle storageClassName: slow mountOptions: - hard - nfsvers=4.1 nfs: path: /tmp server: 172.17.0.2
Exploring the Purpose and Benefits of Persistent Volumes
Persistent Volumes are designed to tackle the challenges of managing data storage in a Kubernetes cluster. By providing a storage abstraction layer, PVs allow for seamless storage management while ensuring data persistence. This abstraction enables developers and operators to focus on application logic rather than worrying about the underlying storage infrastructure.
One of the key benefits of using Persistent Volumes is their ability to decouple storage from individual pods. This decoupling allows for greater flexibility in managing pods, as the data stored in a Persistent Volume remains intact even if a pod is deleted or rescheduled. Additionally, Persistent Volumes support various access modes, allowing multiple pods to read and write data simultaneously.
When it comes to data storage in a Kubernetes cluster, Persistent Volumes play a crucial role. They act as a bridge between the applications running in pods and the underlying storage infrastructure. By abstracting away the complexities of storage management, PVs simplify the deployment and scaling of applications, making it easier for developers to focus on their core tasks.
Furthermore, Persistent Volumes provide a reliable and consistent storage solution. The data stored in a PV is not tied to a specific pod, meaning that even if a pod fails or is rescheduled, the data remains accessible. This ensures data integrity and high availability, critical for applications that require persistent storage.
Features of Persistent Volumes
Each PV contains a specification (spec) and status of the volume. This section describes the spec attributes of a PV configuration file, with reference to the YAML configuration file example given above.
Capacity
Generally, a PV will specify storage capacity. This is set by using the capacity property of the PV. Currently, the capacity property attribute storage is the only resource that can be set or requested. In the future, it may include attributes such as IOPS, throughput rate, etc..
Volume Modes
Kubernetes supports two volume modes of persistent volumes. A valid value for volume mode can be either Filesystem
or Block
. Filesystem
is the default mode if the volume mode is not defined.
Access Modes
ReadOnlyMany(ROX)
allows being mounted by multiple nodes in read-only mode.ReadWriteOnce(RWO)
allows being mounted by a single node in read-write mode.ReadWriteMany(RWX)
allows multiple nodes to be mounted in read-write mode.
A volume can only be mounted using one access mode at a time, even if it supports many access modes.
Class
A PV can specify a StorageClass to dynamically bind the PV and PVC, where the specific StorageClass is specified via the storageClassName
property. If no PV is specified with this property, it can only bind to a PVC that does not require a specific class.
Reclaim Policy
When the node no longer needs persistent storage, the reclaiming strategies that can be used include:
Retain
- meaning the PV, until deleted, is kept alive.Recycle
- meaning the data can be restored later after getting scrubbed.Delete
- associated storage assets (such as AWS EBS, GCE PD, Azure Disk, and OpenStack Cinder volumes) are deleted.
Currently, only NFS and hostPath support the Recycle
policy. AWS EBS, GCE PD, Azure Disk, and Cinder volumes support the Delete
policy.
Mount Options
Kubernetes administrators can specify mount options for mounting persistent volumes on a node. Not all PV types support mount options.
Common types of mount options supported are:
- gcePersistentDisk
- awsElasticBlockStore
- AzureDisk
- NFS
- RBD (Rados Block Device)
- CephFS
- Cinder (OpenStack volume storage)
- Glusterfs
What are Persistent Volume Claims (PVCs)?
PVC is a declaration defining the request for storage data usage, which is mounted into a Pod for use. PVC is configured for use by developers, who do not necessarily care about the specific implementation of the underlying data storage, but more so about the business-related data storage size, access methods, etc.
Here is the configuration file for the PersistentVolumeClaim
:
apiVersion: v1 kind: PersistentVolumeClaim metadata: name: pv0004 spec: storageClassName: manual accessModes: - ReadWriteOnce resources: requests: storage: 3Gi
Expanding Persistent Volume Claims
Support for expanding persistent volume claims (PVCs) is enabled by default. The following are volumes that can be expanded:
- gcePersistentDisk
- awsElasticBlockStore
- Cinder
- Glusterfs
- RBD
- Azure File
- Azure Disk
- Portworx
- FlexVolumes
- CSI
Storage class allowVolumeExpansion
field must be set to true
to expand a PVC:
apiVersion: storage.k8s.io/v1 kind: StorageClass metadata: name: pv0003 provisioner: kubernetes.io/glusterfs parameters: resturl: “http://192.168.10.100:8080” restuser: “” secretNamespace: “” secretName: “” allowVolumeExpansion: true
Lifecycle of PV and PVC
In a Kubernetes cluster, a PV exists as a storage resource in the cluster. PVCs are requests for those resources and also act as claim checks to the resource. The interaction between PVs and PVCs follows this lifecycle:
Provisioning
- the creation of the PV, either directly (static) or dynamically usingStorageClass
.Binding
- assigning the PV to the PVC.Using
- Pods use the volume through the PVC.Reclaiming
- the PV is reclaimed, either by keeping it for the next use or by deleting it directly from the cloud storage.
A volume will be in one of the following states:
Available
- this state shows that the PV is ready to be used by the PVC.Bound
- this state shows that the PV has been assigned to a PVC.Released
- the claim has been deleted, but the cluster has not yet reclaimed the resource.Failed
- this state shows that an error has occurred in the PV.
Provisioning
There are two ways to provision persistent storage volumes in Kubernetes:
Static
PVs are created by Kubernetes cluster administrators and exist in the Kubernetes API. PVs represent real storage, and these stores provided by PVs are available to all users in the cluster. With static provisioning, the PV is created in advance by the cluster administrator; the developer creates the PVC and the Pod, and the Pod uses the storage provided by the PV through the PVC.
Dynamic
For dynamic provisioning, when none of the static PVs created by the administrator can match the user’s PVC, the cluster will try to automatically provision a storage volume for the PVC, which is based on StorageClass
. In the dynamic provisioning direction, the PVC needs to request a storage class, but this storage class must be pre-created and configured by the administrator. The cluster administrator needs to enable the access controller for DefaultStorageClass
in the API Server.
Binding
The user creates a PVC (or has previously created one for dynamic provisioning), specifying the requested storage size and access mode. The master has a control loop to monitor new PVCs, find matching PVs (if any), and bind the PVC and PV together. If a PV is ever dynamically provisioned to a new PVC, the loop will always bind that PV to the PVC. In addition, users will always get at least the storage they request, but the volume may exceed their request. Once bound, PVC bindings are exclusive, no matter what their binding mode is.
If no matching PV is found, the PVC will remain unbound indefinitely, and once the PV is available, then the PVC will become bound again. For example, if a cluster is provisioned with many 50G PVs, it will not match the 100G PVCs requested, and the PVCs will not be bound until 100G PVs are added to the cluster.
Using
The Pod uses PVC as a volume, and the Kubernetes cluster looks up the bound PV by the PVC and mounts it to the Pod. The user can specify the access method when using PVC as a volume. For volumes that support multiple access methods, the user specifies which mode is desired when using their claim as a volume in a Pod. Once a user has a bound PVC, the bound PV belongs to that user. The user can access the possessed PV through the PVC contained in the Pod’s storage volume.
Reclaiming
When a user is done with their volume, they can delete the PVC objects from the API that allows reclamation of the resource. The reclaim policy for a PersistentVolume
tells the cluster what to do with the volume after it has been released of its claim. Currently, volumes can either be retained or deleted.
Retain
The Retain
reclaim policy allows for manual reclamation of the resource. When the PVC is deleted, the PV still exists, and the volume is considered "released." However, it is not yet available for another claim because the previous claimant's data remains on the volume.
Delete
For storage volume plugins that support a Delete
reclaim policy, deletion removes the PV from Kubernetes. Also, it removes the storage asset from the associated external infrastructure, such as AWS EBS, GCE PD, Azure Disk, or Cinder storage volumes.
Introduction to Kubernetes Storage Concepts and Demo
Common Use Cases
One common use case for Persistent Volumes is database management. Databases often require persistent storage to maintain data integrity and ensure high availability. By using a Persistent Volume, administrators can ensure that the data stored in the database remains persistent even if the corresponding pod goes offline or gets rescheduled.
Another use case for Persistent Volumes is file sharing and collaboration applications. Applications like content management systems or file servers often require shared storage where multiple pods can read and write data simultaneously. Persistent Volumes provide the necessary functionality to meet these requirements.
In addition to databases and file sharing applications, Persistent Volumes can also be beneficial in other scenarios. For example, in machine learning applications, where large datasets need to be stored and accessed by multiple pods simultaneously, PVs offer a scalable and efficient solution. Similarly, in IoT deployments, where sensor data needs to be collected and stored persistently, Persistent Volumes can ensure data reliability and availability.
Overall, Persistent Volumes are a powerful tool in the Kubernetes ecosystem. They provide a flexible and reliable way to manage data storage, decoupling it from individual pods and enabling seamless scaling and management of applications. By understanding the purpose and benefits of Persistent Volumes, developers and operators can make informed decisions about when and how to leverage this technology in their applications.
Now, take a look at a few examples to learn about common use cases.
Example 1:
The following config file describes a single-instance MySQL Deployment. The MySQL container mounts the PV at /var/lib/mysql
. The MYSQL_ROOT_PASSWORD
environment variable sets the database password from the Secret
.
apiVersion: v1 kind: Service metadata: name: wordpress-mysql labels: app: wordpress spec: ports: - port: 3306 selector: app: wordpress tier: mysql clusterIP: None---apiVersion: v1 kind: PersistentVolumeClaim metadata: name: mysql-pv-claim labels: app: wordpress spec: accessModes: - ReadWriteOnce resources: requests: storage: 20Gi---apiVersion: apps/v1 kind: Deployment metadata: name: wordpress-mysql labels: app: wordpress spec: selector: matchLabels: app: wordpress tier: mysql strategy: type: Recreate template: metadata: labels: app: wordpress tier: mysql spec: containers: - image: mysql:5.6 name: mysql env: - name: MYSQL_ROOT_PASSWORD valueFrom: secretKeyRef: name: mysql-pass key: password ports: - containerPort: 3306 name: mysql volumeMounts: - name: mysql-persistent-storage mountPath: /var/lib/mysql volumes: - name: mysql-persistent-storage persistentVolumeClaim: claimName: mysql-pv-claim
Example 2:
The following config file describes a single-instance WordPress Deployment. The WordPress container mounts the PV at /var/www/html
for website data files. The WORDPRESS_DB_HOST
environment variable sets the name of the MySQL Service
defined above, and WordPress will access the database by Service
. The WORDPRESS_DB_PASSWORD
environment variable sets the database password from the Secret
kustomize generated.
apiVersion: v1 kind: Service metadata: name: wordpress labels: app: wordpress spec: ports: - port: 80 selector: app: wordpress tier: frontend type: LoadBalancer---apiVersion: v1 kind: PersistentVolumeClaim metadata: name: wp-pv-claim labels: app: wordpress spec: accessModes: - ReadWriteOnce resources: requests: storage: 20Gi---apiVersion: apps/v1 kind: Deployment metadata: name: wordpress labels: app: wordpress spec: selector: matchLabels: app: wordpress tier: frontend strategy: type: Recreate template: metadata: labels: app: wordpress tier: frontend spec: containers: - image: wordpress:4.8-apache name: wordpress env: - name: WORDPRESS_DB_HOST value: wordpress-mysql - name: WORDPRESS_DB_PASSWORD valueFrom: secretKeyRef: name: mysql-pass key: password ports: - containerPort: 80 name: wordpress volumeMounts: - name: wordpress-persistent-storage mountPath: /var/www/html volumes: - name: wordpress-persistent-storage persistentVolumeClaim: claimName: wp-pv-claim
Example 3:
The following config file describes PVC requesting a raw block volume.
apiVersion: v1 kind: PersistentVolumeClaim metadata: name: block-pvc spec: accessModes: - ReadWriteOnce volumeMode: Block resources: requests: storage: 10Gi
Example 4:
The following config file describes creating a PVC from a volume snapshot.
apiVersion: v1 kind: PersistentVolumeClaim metadata: name: restore-pvc spec: storageClassName: csi-hostpath-sc dataSource: name: new-snapshot-test kind: VolumeSnapshot apiGroup: snapshot.storage.k8s.io accessModes: - ReadWriteOnce resources: requests: storage: 10Gi
Example 5:
The following config file describes creating a PersistentVolumeClaim
from an existing PVC.
apiVersion: v1 kind: PersistentVolumeClaim metadata: name: cloned-pvc spec: storageClassName: my-csi-plugin dataSource: name: existing-src-pvc-name kind: PersistentVolumeClaim accessModes: - ReadWriteOnce resources: requests: storage: 10Gi
Best Practices
When configuring a PV, Kubernetes documentation recommends the following set of best practices to keep in mind:
- To reduce management overhead and enable scaling, avoid statically creating and assigning persistent volumes. Instead, use dynamic provisioning. In your storage class, define the appropriate reclaim policy to minimize storage costs once Pods are deleted.
- A maximum number of sizes is supported by each node; therefore, different amounts of local storage and capacity are provided by different node sizes. Plan appropriately for your application demands to deploy the right size of nodes.
- The persistent volume (PV) lifecycle is independent of any particular container in the cluster. Persistent volume claims (PVC) are a request made by a container user or application for a specific type of storage. When creating a PV, Kubernetes documentation recommends the following:
- Always include PVCs in the container configuration.
- Never include PVs in container configuration, as this will tightly couple a container to a specific volume.
- Always have a default StorageClass; otherwise, PVCs that don’t specify a specific class will fail.
- Give StorageClasses meaningful names.
Exploring the Difference Between Kubernetes Volumes and Persistent Volumes
In Kubernetes, volumes are essentially temporary directories that can be mounted by containers within a pod. However, once a pod terminates, the data stored within the volumes is lost. On the contrary, Persistent Volumes provide a means to decouple storage from pods, enabling data persistence even when pods come and go.
Moreover, Persistent Volumes in Kubernetes offer support for various storage technologies, including local storage, network-attached storage (NAS), and cloud storage platforms. This flexibility ensures that developers can meet their specific storage requirements without being bound to a single solution.
When it comes to Kubernetes volumes, they are tightly coupled with the lifecycle of a pod. This means that when a pod is terminated or rescheduled, the data stored within its volumes is lost. This limitation can be problematic for applications that require data persistence, such as databases or file storage systems.
On the other hand, Persistent Volumes provide a solution to this problem by decoupling storage from pods. This means that even if a pod is terminated or rescheduled, the data stored in the Persistent Volume remains intact. This decoupling allows for data persistence and ensures that applications can rely on their data even in the face of pod failures or changes.
Furthermore, Persistent Volumes offer support for various storage technologies, giving developers the flexibility to choose the most suitable option for their specific needs. Whether it's local storage for high-performance applications, network-attached storage (NAS) for shared access, or cloud storage platforms for scalability, Kubernetes' Persistent Volumes can accommodate a wide range of storage requirements.
Overall, Persistent Volumes empower developers to build scalable and resilient applications in Kubernetes, with the assurance that their data will persist across pod lifecycles.
Conclusion
Kubernetes persistent storage offers Kubernetes applications a convenient way to request and consume storage resources. PVC and PV are equivalent to object-oriented interfaces and implementations. The Pod created by the user declares the PVC, and Kubernetes will find a PV to pair it with. If there is no PV to pair with, go to the corresponding StorageClass, help it create a PV, and then complete the binding with the PVC. The newly created PV needs to create a remote disk for the host through the master node attached and then mount the attached remote disk to the host directory through the kubelet component of each node.
PVs are ideal if you have data that has to be shared between Pods or that must survive restarts. PVs can be defined and tied to a specific Pod, and, therefore, you can now run data-driven applications on Kubernetes as well.
Photo by Annie Spratt on Unsplash
Building Multi-tenant Stateful Kubernetes with vCluster and Portworx
Additional Articles You May Like:
- Kubernetes Development Environments - A Comparison
- Development in Kubernetes - Local vs. Remote Clusters
- Docker Compose Alternatives for Kubernetes
- 5 Key Elements for a Great Developer Experience with Kubernetes
- Kubernetes Multi-Tenancy – A Best Practices Guide
- 10 Essentials For Kubernetes Multi-Tenancy
- Kubernetes Network Policies: A Practitioner's Guide
- Kubernetes RBAC: Basics and Advanced Patterns
- Kubernetes Readiness Probes - Examples & Common Pitfalls
- Kubernetes StatefulSet - Examples & Best Practices
- Kubernetes Service Account: What It Is and How to Use It
- Kubernetes NGINX Ingress: 10 Useful Configuration Options
- A Complete Guide to Kubernetes Cost Optimization
- Advanced Guide to Kubernetes Ingress Controllers
- GitOps + Kubernetes Explained
- Why Platform Engineering Teams Should Standardize on Kubernetes
- Platform Engineering on Kubernetes for Accelerating Development Workflows