# How to Use NFS Storage with Kubernetes Clusters DigitalOcean Kubernetes (DOKS) is a Kubernetes service with a fully managed control plane, high availability, and autoscaling. DOKS integrates with standard Kubernetes toolchains and DigitalOcean’s load balancers, volumes, CPU and GPU Droplets, API, and CLI. You can connect your DOKS clusters to a [DigitalOcean NFS Share](https://docs.digitalocean.com/products/nfs/index.html.md) and use the share for tasks such as AI/ML Kubernetes workloads. For other persistent storage options, see [Add Volumes to Kubernetes Clusters](https://docs.digitalocean.com/products/kubernetes/how-to/add-volumes/index.html.md). To use an NFS share with your DOKS cluster, you statically provision a PersistentVolume (PV), bind the PV to a PersistentVolumeClaim (PVC), and then mount the PVC to your workload. **Note**: You can create and use NFS shares with DOKS clusters only in regions where [DigitalOcean NFS shares are available](https://docs.digitalocean.com/products/nfs/details/availability/index.html.md) and only when the cluster and NFS share are on the same VPC network. ## Prerequisites To connect an existing DOKS cluster to a DigitalOcean NFS share, you need to: - Create an NFS share. You can provision one using either the [DigitalOcean Control Panel](https://docs.digitalocean.com/products/nfs/how-to/create/index.html.md) or the [API](https://docs.digitalocean.com/reference/api/digitalocean/index.html.md#tag/nfs). - Get the connection details once the share is active. In the left menu of the control panel, click **Network File Storage** to open the **Network File Storage** page which lists all the NFS shares. Note the server IP address and mount path values in the **Mount Path** column. The server IP address is the value before the `:` and the mount path is the value after the `:`. For example, if the value is `10.128.0.69:/123456/6160d138-60cb-4e61-9ff3-076eebed5c0f`, then the server IP address is `10.128.0.69` and the mount path is `/123456/6160d138-60cb-4e61-9ff3-076eebed5c0f`. To get the values using the API, send a `GET` request to the `/v2/nfs` endpoint. From the API response, note the host IP address and the mount path. For example: ```js ... "host": "10.128.0.69", "mount_path": "/123456/38bc6f86-9927-491a-a7b5-c5627219a0d3", ... ``` The `host` value is the server IP address. The `mount_path` value provides the path to use when configuring your Kubernetes cluster. ## Create PersistentVolume A [PersistentVolume](https://kubernetes.io/docs/concepts/storage/persistent-volumes) (PV) is a cluster-level resource that registers your DigitalOcean NFS Share with Kubernetes, making it available for use across the entire cluster. To provision a PV for your NFS share, create the following config file named `nfs-pv.yaml`, replacing the values for `server` and `path` with the `host` and `mount_path` values of your NFS share. The size of the PV should ideally match your share’s size and the `accessModes` must be `ReadWriteMany` to allow multiple pods to read and write to the volume simultaneously. The `mountOptions` section sets `nconnect=8`, which opens 8 parallel TCP connections to the NFS server to improve throughput. `nfs-pv.yaml` ```yaml apiVersion: v1 kind: PersistentVolume metadata: name: do-nfs-pv labels: type: nfs-model-storage spec: capacity: storage: 10Gi accessModes: - ReadWriteMany persistentVolumeReclaimPolicy: Retain mountOptions: - nconnect=8 nfs: server: "10.128.0.69" path: "/123456/38bc6f86-9927-491a-a7b5-c5627219a0d3" ``` Use `kubectl apply` to create the PV: ``` kubectl apply -f nfs-pv.yaml ``` ## Create PersistentVolumeClaim A [PersistentVolumeClaim](https://kubernetes.io/docs/concepts/storage/persistent-volumes/#persistentvolumeclaims) (PVC) is how your applications request access to the storage made available by the PV. To provision a PVC for your NFS share, create the following config file named `nfs-pvc.yaml`. The label for the PVC must match the label for your PV to ensure that the PVC binds to the specific NFS PV. The `accessModes` must be `ReadWriteMany` to allow multiple pods to read and write to the PVC simultaneously. `nfs-pvc.yaml` ```yaml apiVersion: v1 kind: PersistentVolumeClaim metadata: name: do-nfs-pvc namespace: sammy-doks spec: storageClassName: "" accessModes: - ReadWriteMany resources: requests: storage: 10Gi selector: matchLabels: type: nfs-model-storage ``` In the config file, the `storageClassName` field is set to `""`. This instructs DOKS to find a pre-existing, statically provisioned PV matching the specified PV label and links your PVC directly to your manually configured NFS share. DOKS has built-in [StorageClass](https://kubernetes.io/docs/concepts/storage/storage-classes/) options such as `do-block-storage` that dynamically provision new storage volumes when a PVC requests them. However, in this case, you have already provisioned the storage when creating the PV and therefore do not need DOKS to dynamically provision one. Use `kubectl apply` to create the PV: ``` kubectl apply -f nfs-pvc.yaml ``` ## Mount PVC in Your Workload After your PVC is bound to the PV, you can mount it to a workload such as Deployment, Pod, Job, or DaemonSet. The following config file demonstrates how to mount the storage to a pod and write the current date to a log file on the NFS share every 5 seconds. To mount the volume to the pod and reference your PVC, add the `volumes` section to the specification. The `claimName` field must match the [name you specified for your PVC](#create-persistentvolumeclaim). Next, add the `volumeMounts` section where the `name` field must match the volume name you specified earlier and the `mountPath` field specifies the path where the volume will be mounted in the container’s filesystem. The `securityContext` section configures the Pod to [run as a non-root user](#run-workloads-as-non-root-users). This is required because DigitalOcean NFS shares enforce root squashing, which prevents root users from writing to the share. `pod-with-nfs.yaml` ```yaml apiVersion: v1 kind: Pod metadata: name: nfs-test-pod namespace: sammy-doks spec: volumes: - name: my-nfs-share persistentVolumeClaim: claimName: do-nfs-pvc containers: - name: my-app-container image: busybox command: ["/bin/sh", "-c", "while true; do date >> /data/test.log; sleep 5; done"] volumeMounts: - name: my-nfs-share mountPath: "/data" securityContext: runAsUser: 1000 runAsGroup: 1000 ``` After you apply this manifest using `kubectl apply -f pod-with-nfs.yaml`, the pod reads from and writes to its `/data` directory, with all files persisting directly on your DigitalOcean NFS Share. ## Run Workloads as a Non-Root User DigitalOcean NFS shares enforce root squashing, a security feature that maps root user operations from NFS clients to an unprivileged user. As a result, workloads running as the root user (User ID 0) can read from the NFS share but receive permission denied errors when attempting to write to it. To enable write access, the workload must run as a non-root user. Containers run as root unless their Dockerfile specifies otherwise. If the container in your workload specification runs as root by default, you can configure the workload to use a non-root user in the `securityContext` section of the config file. The `runAsUser` field specifies which User ID (UID) the workload runs as, and `runAsGroup` specifies the Group ID (GID). Set these fields to non-zero values that have write access to the files on the NFS share. The following example shows the config file for a Job that processes data on an NFS share. The `securityContext` section specifies that the Job runs with UID 1000 and GID 1000: `job-with-nfs.yaml` ```yaml apiVersion: batch/v1 kind: Job metadata: name: nfs-data-job namespace: sammy-doks spec: template: spec: securityContext: runAsUser: 1000 runAsGroup: 1000 volumes: - name: my-nfs-share persistentVolumeClaim: claimName: do-nfs-pvc containers: - name: data-processor image: your-image:tag volumeMounts: - name: my-nfs-share mountPath: "/data" restartPolicy: OnFailure ``` ## Optimize NFS Performance on GPU Nodes GPU Droplets support jumbo frames (9000 MTU) on their VPC interface, which improves NFS throughput for large AI/ML data transfers. Because NFS mounts negotiate TCP connection parameters at mount time based on the interface’s active MTU, you must apply network tuning before mounting any NFS shares to achieve full jumbo frame throughput. **Note**: Only GPU Droplets support jumbo frames. Setting MTU higher than 1500 on non-GPU Droplets is not supported. ### The Race Condition When the cluster autoscaler provisions a new GPU node, both DaemonSet pods and workload pods become schedulable on the node simultaneously. If a workload pod mounts NFS before a network-tuning DaemonSet has set the MTU to 9000, the TCP connection’s MSS is negotiated at the default 1500 MTU and is never renegotiated. This means throughput stays degraded for the lifetime of that mount, even after the MTU is later increased. The solution is a taint-based strategy: new GPU nodes join with a taint that blocks workload scheduling. A DaemonSet tolerates the taint, applies network tuning, and then removes the taint so workloads can schedule with the correct MTU already in place. ### Step 1: Configure the Startup Taint Add the taint `node.digitalocean.com/network-not-tuned:NoSchedule` to your GPU node pool. You can do this in the DigitalOcean Control Panel under your cluster’s node pool settings, or via the API by including the taint in the node pool configuration. Every new node in the pool, including nodes provisioned by the autoscaler, joins the cluster with this taint. Because the taint uses the `NoSchedule` effect, workload pods that do not tolerate the taint cannot be scheduled on the node. The DaemonSet deployed in the next steps tolerates this taint, applies network tuning, and then removes the taint to unblock workloads. ### Step 2: Deploy RBAC Resources The network tuning DaemonSet needs permission to remove taints from nodes. Create a ServiceAccount, ClusterRole, and ClusterRoleBinding to grant these permissions. Create the following config file named `gpu-network-tuner-rbac.yaml`: `gpu-network-tuner-rbac.yaml` ```yaml apiVersion: v1 kind: ServiceAccount metadata: name: gpu-network-tuner namespace: kube-system --- apiVersion: rbac.authorization.k8s.io/v1 kind: ClusterRole metadata: name: gpu-network-tuner rules: - apiGroups: [""] resources: ["nodes"] verbs: ["get", "patch"] --- apiVersion: rbac.authorization.k8s.io/v1 kind: ClusterRoleBinding metadata: name: gpu-network-tuner roleRef: apiGroup: rbac.authorization.k8s.io kind: ClusterRole name: gpu-network-tuner subjects: - kind: ServiceAccount name: gpu-network-tuner namespace: kube-system ``` Use `kubectl apply` to create the RBAC resources: ```shell kubectl apply -f gpu-network-tuner-rbac.yaml ``` ### Step 3: Deploy the Network Tuning DaemonSet The network tuning DaemonSet runs two init containers on each GPU node before the main pause container: - **network-tuner**: Runs as a privileged container with host networking. It sets the VPC interface MTU to 9000 via netplan, applies TCP buffer sysctl parameters (`rmem_max`, `wmem_max`, `tcp_rmem`, `tcp_wmem`), and persists both settings so they survive reboots. - **remove-taint**: Uses the host’s `kubectl` binary to remove the `network-not-tuned` taint from the node, allowing workload pods to schedule. Create the following config file named `gpu-network-tuner.yaml`: `gpu-network-tuner.yaml` ```yaml apiVersion: v1 kind: ServiceAccount metadata: name: gpu-network-tuner namespace: kube-system --- apiVersion: rbac.authorization.k8s.io/v1 kind: ClusterRole metadata: name: gpu-network-tuner rules: - apiGroups: [""] resources: ["nodes"] verbs: ["get", "patch"] --- apiVersion: rbac.authorization.k8s.io/v1 kind: ClusterRoleBinding metadata: name: gpu-network-tuner roleRef: apiGroup: rbac.authorization.k8s.io kind: ClusterRole name: gpu-network-tuner subjects: - kind: ServiceAccount name: gpu-network-tuner namespace: kube-system --- apiVersion: apps/v1 kind: DaemonSet metadata: name: gpu-network-tuner namespace: kube-system labels: app: gpu-network-tuner spec: selector: matchLabels: app: gpu-network-tuner template: metadata: labels: app: gpu-network-tuner spec: affinity: nodeAffinity: requiredDuringSchedulingIgnoredDuringExecution: nodeSelectorTerms: - matchExpressions: - key: doks.digitalocean.com/gpu-brand operator: In values: - amd - nvidia tolerations: - key: amd.com/gpu operator: Exists effect: NoSchedule - key: nvidia.com/gpu operator: Exists effect: NoSchedule # Tolerate the custom network-not-ready taint - key: node.digitalocean.com/network-not-tuned operator: Exists effect: NoSchedule hostNetwork: true hostPID: true serviceAccountName: gpu-network-tuner volumes: - name: host-kubectl hostPath: path: /usr/bin/kubectl type: File initContainers: - name: network-tuner image: busybox:stable command: - /bin/sh - -c - | set -e echo "=== Applying sysctl tuning ===" sysctl -w net.core.rmem_max=16777216 sysctl -w net.core.wmem_max=16777216 sysctl -w net.ipv4.tcp_rmem="4096 87380 16777216" sysctl -w net.ipv4.tcp_wmem="4096 65536 16777216" echo "=== Persisting sysctl settings ===" cat <<'SYSCTL' | nsenter -t 1 -m -- tee /etc/sysctl.d/99-gpu-network-tuning.conf > /dev/null net.core.rmem_max=16777216 net.core.wmem_max=16777216 net.ipv4.tcp_rmem=4096 87380 16777216 net.ipv4.tcp_wmem=4096 65536 16777216 SYSCTL echo "=== Persisting MTU 9000 for VPC interface via netplan ===" nsenter -t 1 -m -- sed -i '/set-name.*eth1/{n;s/mtu: 1500/mtu: 9000/}' /etc/netplan/50-cloud-init.yaml echo "=== Applying netplan ===" nsenter -t 1 -m -- netplan apply echo "=== Fallback: setting MTU directly ===" ip link set eth1 mtu 9000 || true echo "=== Verifying settings ===" sysctl net.core.rmem_max net.core.wmem_max net.ipv4.tcp_rmem net.ipv4.tcp_wmem ip link show eth1 | grep mtu echo "=== Network tuning complete ===" securityContext: privileged: true - name: remove-taint image: busybox:stable command: ["/bin/sh", "-c"] args: ["/host-bin/kubectl taint nodes $(NODE_NAME) node.digitalocean.com/network-not-tuned:NoSchedule- || true"] volumeMounts: - name: host-kubectl mountPath: /host-bin/kubectl readOnly: true env: - name: NODE_NAME valueFrom: fieldRef: fieldPath: spec.nodeName containers: - name: pause image: busybox:stable command: ["sleep", "infinity"] resources: requests: cpu: 1m memory: 1Mi limits: cpu: 10m memory: 10Mi ``` Use `kubectl apply` to deploy the DaemonSet: ```shell kubectl apply -f gpu-network-tuner.yaml ``` Once deployed, every new GPU node (including autoscaler-provisioned nodes) goes through this sequence: the node joins with the `network-not-tuned` taint, the DaemonSet’s init containers apply MTU and sysctl tuning, the taint is removed, and then workload pods can schedule and mount NFS with the optimized network settings already in place.