Kubernetes Persistent Volume: Examples & Best Practices

Shingai Zivuku

October 2, 2023

17 Minute Read

This is some text inside of a div block.

Are you struggling to manage data reliably in a fast-paced digital environment? Kubernetes offers advantages like easy workload management with containers, but handling stateful applications requires care. By default, Kubernetes Pods have writable disk space that gets deleted along with the Pod, which can be problematic for applications storing data like databases or logs.

To solve this, Kubernetes provides Persistent Volumes (PV), which is an abstraction for managing data storage. PVs allow users to allocate storage for containers without worrying about the underlying storage details. This API object, managed by the cluster administrator, supports various systems like NFS, iSCSI, or cloud storage.

In this article, we'll explore Kubernetes Persistent Volume and their best use cases.

Main Points

Kubernetes persistent volume (PV) ensures data retention regardless of a Pod's lifecycle. This ensures that data remains even if pods are deleted or rescheduled.
Persistent volume claims (PVCs) request and mount specific storage to pods. This simplifies storage management for developers.
Kubernetes supports multiple storage types, including local storage, network-attached storage (NAS), and cloud storage platforms, allowing flexible storage options.
PVs can be dynamically provisioned through StorageClasses when no existing PV matches the user’s request, automating the creation and binding process.

What are Persistent Volumes (PVs)?

A Persistent Volume (PV) defines storage data, like storage classes or implementations. Unlike ordinary volumes, a PV is a resource in a Kubernetes cluster. Creating a PV is the same as creating a storage resource.

Users must request this resource via persistent volume claims (PVC). These are storage requests. A PV mounts into a Pod using a PVC. The cluster administrator can map different classes to service levels and backend policies.

You can configure persistent volumes through a YAML file. Specify which volume plugin type to use. The following is a YAML configuration file for a persistent volume. This configuration file requires you to provide 5 Gi of storage space.

The storage mode is Filesystem, and the access mode is ReadWriteOnce. The Recycle policy recycles the persistent volume. Finally, we specify the storage class as slow and use the NFS volume plugin type.

apiVersion: v1 kind: PersistentVolume metadata: name: pv0005 spec: capacity: storage: 5Gi volumeMode: Filesystem accessModes: - ReadWriteOnce persistentVolumeReclaimPolicy: Recycle storageClassName: slow mountOptions: - hard - nfsvers=4.1 nfs: path: /tmp server: 172.17.0.2

Exploring the Purpose and Benefits of Persistent Volumes

Persistent volumes solve the challenges of managing data storage in a Kubernetes cluster. Kubernetes persistent volumes provide a storage abstraction layer. They enable seamless storage management while ensuring data persists. This lets developers and operators focus on app logic, not storage.

A key benefit of persistent volumes is that they decouple storage from pods. This decoupling allows for greater flexibility in managing pods. A Kubernetes persistent volume keeps its data if a pod is deleted or rescheduled. Also, Kubernetes persistent volume claims support various access modes. They allow many pods to read and write data at the same time.

Persistent volumes play a crucial role in data storage in a Kubernetes cluster. They bridge the apps in pods and the storage system, simplifying storage management. They also help deploy and scale containerized apps. This lets developers focus on their core tasks.

Furthermore, Kubernetes persistent volumes provide a reliable and consistent storage solution. A PV's data is not tied to a specific pod. So, if a pod fails or someone reschedules it, the data remains accessible. This ensures data integrity and high availability. Both are critical for apps that need Kubernetes persistent volumes for storage.

Features of Persistent Volumes

Each PV contains a specification (spec) and the volume's status. This section describes the spec attributes of a PV configuration file used for persistent volumes. A valid volume mode value can be either Filesystem or Block. If the volume mode is not defined, Filesystem is the volume's default mode status. This section describes the spec attributes of a PV configuration file, with reference to the YAML configuration file example given above.

Capacity

Generally, a PV will specify storage capacity. This is set by using the capacity property of the PV. Currently, the capacity property attribute storage is the only resource that can be set or requested. In the future, it may include attributes such as IOPS, throughput rate, etc..

Volume Modes

Kubernetes supports two volume modes of persistent volumes. A valid value for volume mode can be either Filesystem or Block. Filesystem is the default mode if the volume mode is not defined.

Access Modes

ReadOnlyMany(ROX) allows being mounted by multiple nodes in read-only mode.
ReadWriteOnce(RWO) allows being mounted by a single node in read-write mode.
ReadWriteMany(RWX) allows multiple nodes to be mounted in read-write mode.

A volume can only be mounted using one access mode at a time, even if it supports many access modes.

Class

A PV can specify a StorageClass to dynamically bind the PV and PVC, where the specific StorageClass is specified via the storageClassName property. If no PV is specified with this property, it can only bind to a PVC that does not require a specific class.

Reclaim Policy

When the node no longer needs persistent storage, the reclaiming strategies that can be used include:

Retain - meaning the PV, until deleted, is kept alive.
Recycle - meaning the data can be restored later after getting scrubbed.
Delete - associated storage assets (such as AWS EBS, GCE PD, Azure Disk, and OpenStack Cinder volumes) are deleted.

Currently, only NFS and hostPath support the Recycle policy. AWS EBS, GCE PD, Azure Disk, and Cinder volumes support the Delete policy.

Mount Options

Kubernetes administrators can specify mount options for mounting persistent volumes on a node. Not all PV types support mount options.

Common types of mount options supported are:

gcePersistentDisk
awsElasticBlockStore
AzureDisk
NFS
RBD (Rados Block Device)
CephFS
Cinder (OpenStack volume storage)
Glusterfs

What are Persistent Volume Claims (PVCs)?

PVC is a declaration defining the request for storage data usage, which is mounted into a Pod. PVC is configured for use by developers, who do not necessarily care about the specific implementation of the underlying data storage, but more so about the business-related data storage size, access methods, etc.

Here is the configuration file for the PersistentVolumeClaim:

apiVersion: v1 kind: PersistentVolumeClaim metadata: name: pv0004 spec: storageClassName: manual accessModes: - ReadWriteOnce resources: requests: storage: 3Gi

Expanding Persistent Volume Claims

Support for expanding persistent volume claims (PVCs) is enabled by default. The following are volumes that can be expanded:

gcePersistentDisk
awsElasticBlockStore
Cinder
Glusterfs
RBD
Azure File
Azure Disk
Portworx
FlexVolumes
CSI

Storage class allowVolumeExpansion field must be set to true to expand a PVC:

apiVersion: storage.k8s.io/v1 kind: StorageClass metadata: name: pv0003 provisioner: kubernetes.io/glusterfs parameters: resturl: “http://192.168.10.100:8080” restuser: “” secretNamespace: “” secretName: “” allowVolumeExpansion: true

Lifecycle of PV and PVC

In a Kubernetes cluster, a PV exists as a storage resource in the cluster. PVCs are requests for those resources and also act as claim checks to the resource. The interaction between PVs and PVCs follows this lifecycle:

Provisioning - the creation of the PV, either directly (static) or dynamically using StorageClass.
Binding - assigning the PV to the PVC.
Using - Pods use the volume through the PVC.
Reclaiming - the PV is reclaimed, either by keeping it for the next use or by deleting it directly from the cloud storage.

A volume will be in one of the following states:

Available - this state shows that the PV is ready to be used by the PVC.
Bound - this state shows that the PV has been assigned to a PVC.
Released - the claim has been deleted, but the cluster has not yet reclaimed the resource.
Failed - this state shows that an error has occurred in the PV.

Provisioning

There are two ways to provision persistent storage volumes in Kubernetes:

Static

PVs are created by Kubernetes cluster administrators and exist in the Kubernetes API. PVs represent real storage, and these stores provided by PVs are available to all users in the cluster. With static provisioning, the PV is created in advance by the cluster administrator; the developer creates the PVC and the Pod, and the Pod uses the storage provided by the PV through the PVC.

Dynamic

For dynamic provisioning, when none of the static PVs created by the administrator can match the user’s PVC, the cluster will try to automatically provision a storage volume for the PVC, which is based on StorageClass. In the dynamic provisioning direction, the PVC needs to request a storage class, but this storage class must be pre-created and configured by the administrator. The cluster administrator needs to enable the access controller for DefaultStorageClass in the API Server.

Binding

The user creates a PVC (or has previously created one for dynamic provisioning), specifying the requested storage size and access mode. The master has a control loop to monitor new PVCs, find matching PVs (if any), and bind the PVC and PV together. If a PV is ever dynamically provisioned to a new PVC, the loop will always bind that PV to the PVC. In addition, users will always get at least the storage they request, but the volume may exceed their request. Once bound, PVC bindings are exclusive, no matter what their binding mode is.

If no matching PV is found, the PVC will remain unbound indefinitely, and once the PV is available, then the PVC will become bound again. For example, if a cluster is provisioned with many 50G PVs, it will not match the 100G PVCs requested, and the PVCs will not be bound until 100G PVs are added to the cluster.

Using

The Pod uses PVC as a volume, and the Kubernetes cluster looks up the bound PV by the PVC and mounts it to the Pod. The user can specify the access method when using PVC as a volume. For volumes that support multiple access methods, the user specifies which mode is desired when using their claim as a volume in a Pod. Once a user has a bound PVC, the bound PV belongs to that user. The user can access the possessed PV through the PVC contained in the Pod’s storage volume.

Reclaiming

When a user is done with their volume, they can delete the PVC objects from the API that allows reclamation of the resource. The reclaim policy for a PersistentVolume tells the cluster what to do with the volume after it has been released of its claim. Currently, volumes can either be retained or deleted.

Retain

The Retain reclaim policy allows for manual reclamation of the resource. When the PVC is deleted, the PV still exists, and the volume is considered "released." However, it is not yet available for another claim because the previous claimant's data remains on the volume.

Delete

For storage volume plugins that support a Delete reclaim policy, deletion removes the PV from Kubernetes. Also, it removes the storage asset from the associated external infrastructure, such as AWS EBS, GCE PD, Azure Disk, or Cinder storage volumes.

Introduction to Kubernetes Storage Concepts and Demo

Common Use Cases

One common use case for Persistent Volumes is database management. Databases often require persistent storage to maintain data integrity and ensure high availability. By using a Persistent Volume, administrators can ensure that the data stored in the database remains persistent even if the corresponding pod goes offline or gets rescheduled.

Another use case for Persistent Volumes is file sharing and collaboration applications. Applications like content management systems or file servers often require shared storage where multiple pods can read and write data simultaneously. Persistent Volumes provide the necessary functionality to meet these requirements.

In addition to databases and file sharing applications, Persistent Volumes can also be beneficial in other scenarios. For example, in machine learning applications, where large datasets need to be stored and accessed by multiple pods simultaneously, PVs offer a scalable and efficient solution. Similarly, in IoT deployments, where sensor data needs to be collected and stored persistently, Persistent Volumes can ensure data reliability and availability.

Overall, Persistent Volumes are a powerful tool in the Kubernetes ecosystem. They provide a flexible and reliable way to manage data storage, decoupling it from individual pods and enabling seamless scaling and management of applications. By understanding the purpose and benefits of Persistent Volumes, developers and operators can make informed decisions about when and how to leverage this technology in their applications.

Now, take a look at a few examples to learn about common use cases.

Example 1:

The following config file describes a single-instance MySQL Deployment. The MySQL container mounts the PV at /var/lib/mysql. The MYSQL_ROOT_PASSWORD environment variable sets the database password from the Secret.

apiVersion: v1 kind: Service metadata: name: wordpress-mysql labels: app: wordpress spec: ports: - port: 3306 selector: app: wordpress tier: mysql clusterIP: None---apiVersion: v1 kind: PersistentVolumeClaim metadata: name: mysql-pv-claim labels: app: wordpress spec: accessModes: - ReadWriteOnce resources: requests: storage: 20Gi---apiVersion: apps/v1 kind: Deployment metadata: name: wordpress-mysql labels: app: wordpress spec: selector: matchLabels: app: wordpress tier: mysql strategy: type: Recreate template: metadata: labels: app: wordpress tier: mysql spec: containers: - image: mysql:5.6 name: mysql env: - name: MYSQL_ROOT_PASSWORD valueFrom: secretKeyRef: name: mysql-pass key: password ports: - containerPort: 3306 name: mysql volumeMounts: - name: mysql-persistent-storage mountPath: /var/lib/mysql volumes: - name: mysql-persistent-storage persistentVolumeClaim: claimName: mysql-pv-claim

Example 2:

The following config file describes a single-instance WordPress Deployment. The WordPress container mounts the PV at /var/www/html for website data files. The WORDPRESS_DB_HOST environment variable sets the name of the MySQL Service defined above, and WordPress will access the database by Service. The WORDPRESS_DB_PASSWORD environment variable sets the database password from the Secret kustomize generated.

apiVersion: v1 kind: Service metadata: name: wordpress labels: app: wordpress spec: ports: - port: 80 selector: app: wordpress tier: frontend type: LoadBalancer---apiVersion: v1 kind: PersistentVolumeClaim metadata: name: wp-pv-claim labels: app: wordpress spec: accessModes: - ReadWriteOnce resources: requests: storage: 20Gi---apiVersion: apps/v1 kind: Deployment metadata: name: wordpress labels: app: wordpress spec: selector: matchLabels: app: wordpress tier: frontend strategy: type: Recreate template: metadata: labels: app: wordpress tier: frontend spec: containers: - image: wordpress:4.8-apache name: wordpress env: - name: WORDPRESS_DB_HOST value: wordpress-mysql - name: WORDPRESS_DB_PASSWORD valueFrom: secretKeyRef: name: mysql-pass key: password ports: - containerPort: 80 name: wordpress volumeMounts: - name: wordpress-persistent-storage mountPath: /var/www/html volumes: - name: wordpress-persistent-storage persistentVolumeClaim: claimName: wp-pv-claim

Example 3:

The following config file describes PVC requesting a raw block volume.

apiVersion: v1 kind: PersistentVolumeClaim metadata: name: block-pvc spec: accessModes: - ReadWriteOnce volumeMode: Block resources: requests: storage: 10Gi

Example 4:

The following config file describes creating a PVC from a volume snapshot.

apiVersion: v1 kind: PersistentVolumeClaim metadata: name: restore-pvc spec: storageClassName: csi-hostpath-sc dataSource: name: new-snapshot-test kind: VolumeSnapshot apiGroup: snapshot.storage.k8s.io accessModes: - ReadWriteOnce resources: requests: storage: 10Gi

Example 5:

The following config file describes creating a PersistentVolumeClaim from an existing PVC.

apiVersion: v1 kind: PersistentVolumeClaim metadata: name: cloned-pvc spec: storageClassName: my-csi-plugin dataSource: name: existing-src-pvc-name kind: PersistentVolumeClaim accessModes: - ReadWriteOnce resources: requests: storage: 10Gi

Best Practices

When configuring a PV, Kubernetes documentation recommends the following set of best practices to keep in mind:

To reduce management overhead and enable scaling, avoid statically creating and assigning persistent volumes. Instead, use dynamic provisioning. In your storage class, define the appropriate reclaim policy to minimize storage costs once Pods are deleted.
A maximum number of sizes is supported by each node; therefore, different amounts of local storage and capacity are provided by different node sizes. Plan appropriately for your application demands to deploy the right size of nodes.
The persistent volume (PV) lifecycle is independent of any particular container in the cluster. Persistent volume claims (PVC) are a request made by a container user or application for a specific type of storage. When creating a PV, Kubernetes documentation recommends the following:
- Always include PVCs in the container configuration.
- Never include PVs in container configuration, as this will tightly couple a container to a specific volume.
- Always have a default StorageClass; otherwise, PVCs that don’t specify a specific class will fail.
- Give StorageClasses meaningful names.

Exploring the Difference Between Kubernetes Volumes and Persistent Volumes

In Kubernetes, volumes are essentially temporary directories that can be mounted by containers within a pod. However, once a pod terminates, the data stored within the volumes is lost. On the contrary, Persistent Volumes provide a means to decouple storage from pods, enabling data persistence even when pods come and go.

Moreover, Persistent Volumes in Kubernetes offer support for various storage technologies, including local storage, network-attached storage (NAS), and cloud storage platforms. This flexibility ensures that developers can meet their specific storage requirements without being bound to a single solution.

When it comes to Kubernetes volumes, they are tightly coupled with the lifecycle of a pod. This means that when a pod is terminated or rescheduled, the data stored within its volumes is lost. This limitation can be problematic for applications that require data persistence, such as databases or file storage systems.

On the other hand, Persistent Volumes provide a solution to this problem by decoupling storage from pods. This means that even if a pod is terminated or rescheduled, the data stored in the Persistent Volume remains intact. This decoupling allows for data persistence and ensures that applications can rely on their data even in the face of pod failures or changes.

Furthermore, Persistent Volumes offer support for various storage technologies, giving developers the flexibility to choose the most suitable option for their specific needs. Whether it's local storage for high-performance applications, network-attached storage (NAS) for shared access, or cloud storage platforms for scalability, Kubernetes' Persistent Volumes can accommodate a wide range of storage requirements.

Overall, Persistent Volumes empower developers to build scalable and resilient applications in Kubernetes, with the assurance that their data will persist across pod lifecycles.

Take Control of Your Kubernetes Storage with Loft

Ready to streamline your Kubernetes persistent volume management? Look no further than Loft, the expert in Kubernetes storage solutions. With persistent volume claims (PVC) and persistent volumes (PV), managing your data has never been easier. Whether you're running containerized applications needing reliable storage or handling data that must survive restarts, Loft has the tools to simplify it.

Loft automates the pairing of your PVC with the right PV, ensuring seamless integration with your Kubernetes deployments. If no PV is available, Loft helps create one effortlessly through StorageClass, taking the hassle out of storage management. No more worrying about lost data or complex storage setups—let Loft handle it all for you. Managing your data has never been easier with persistent volume claims (PVC) and persistent volumes (PV)

Get started today with Loft and elevate your Kubernetes storage experience.

Photo by Annie Spratt on Unsplash

Building Multi-tenant Stateful Kubernetes with vCluster and Portworx
Additional Articles You May Like:

Frequently Asked Questions

What is a persistent volume in Kubernetes?

A persistent volume (PV) in Kubernetes is a storage resource that exists independently of the lifecycle of Pods. It allows storage to persist beyond the life of a Pod, ensuring data is not lost when Pods are deleted or rescheduled. Administrators provision these volumes, which can be backed up by local or cloud storage.

What is a persistentvolumeclaim in Kubernetes?

A persistent volume claim (PVC) in Kubernetes is a user's request for storage. It allows Pods to request specific storage resources, such as size or access modes, from the cluster's available persistent volumes (PVs). Kubernetes then binds the PVC to an appropriate PV, providing the necessary storage for the Pod.

What types of volumes does Kubernetes support?

Kubernetes supports many types of volumes, including hostPath, emptyDir, configMap, secret, NFS, iSCSI, CephFS, GCEPersistentDisk, AWSElasticBlockStore, AzureDisk, and persistent volume. Depending on the underlying infrastructure and application needs, these volumes offer different storage options.

What is a hostpath volume in Kubernetes?

A hostPath volume in Kubernetes allows a Pod to access a file or directory on the host node's filesystem. This volume type is helpful for scenarios where the Pod needs to read or write data directly to the host. However, it can pose security risks if not used carefully, as it exposes the host's filesystem to the Pod.