# How to Use NFS Storage with Kubernetes Clusters

DigitalOcean Kubernetes (DOKS) is a Kubernetes service with a fully managed control plane, high availability, and autoscaling. DOKS integrates with standard Kubernetes toolchains and DigitalOcean’s load balancers, volumes, CPU and GPU Droplets, API, and CLI.

You can connect your DOKS clusters to a [DigitalOcean NFS Share](https://docs.digitalocean.com/products/nfs/index.html.md) and use the share for tasks such as AI/ML Kubernetes workloads. For other persistent storage options, see [Add Volumes to Kubernetes Clusters](https://docs.digitalocean.com/products/kubernetes/how-to/add-volumes/index.html.md).

To use an NFS share with your DOKS cluster, you statically provision a PersistentVolume (PV), bind the PV to a PersistentVolumeClaim (PVC), and then mount the PVC to your workload.

**Note**: You can create and use NFS shares with DOKS clusters only in regions where [DigitalOcean NFS shares are available](https://docs.digitalocean.com/products/nfs/details/availability/index.html.md) and only when the cluster and NFS share are on the same VPC network.

## Prerequisites

To connect an existing DOKS cluster to a DigitalOcean NFS share, you need to:

- Create an NFS share. You can provision one using either the [DigitalOcean Control Panel](https://docs.digitalocean.com/products/nfs/how-to/create/index.html.md) or the [API](https://docs.digitalocean.com/reference/api/digitalocean/index.html.md#tag/nfs).
- Get the connection details once the share is active.

  In the left menu of the control panel, click **Network File Storage** to open the **Network File Storage** page which lists all the NFS shares. Note the server IP address and mount path values in the **Mount Path** column. The server IP address is the value before the `:` and the mount path is the value after the `:`. For example, if the value is `10.128.0.69:/123456/6160d138-60cb-4e61-9ff3-076eebed5c0f`, then the server IP address is `10.128.0.69` and the mount path is `/123456/6160d138-60cb-4e61-9ff3-076eebed5c0f`.

  To get the values using the API, send a `GET` request to the `/v2/nfs` endpoint. From the API response, note the host IP address and the mount path. For example:

  ```js
  ...
  "host": "10.128.0.69",
  "mount_path": "/123456/38bc6f86-9927-491a-a7b5-c5627219a0d3",
  ...
  ```

  The `host` value is the server IP address. The `mount_path` value provides the path to use when configuring your Kubernetes cluster.

## Create PersistentVolume

A [PersistentVolume](https://kubernetes.io/docs/concepts/storage/persistent-volumes) (PV) is a cluster-level resource that registers your DigitalOcean NFS Share with Kubernetes, making it available for use across the entire cluster.

To provision a PV for your NFS share, create the following config file named `nfs-pv.yaml`, replacing the values for `server` and `path` with the `host` and `mount_path` values of your NFS share. The size of the PV should ideally match your share’s size and the `accessModes` must be `ReadWriteMany` to allow multiple pods to read and write to the volume simultaneously. The `mountOptions` section sets `nconnect=8`, which opens 8 parallel TCP connections to the NFS server to improve throughput.

`nfs-pv.yaml`

```yaml
apiVersion: v1
kind: PersistentVolume
metadata:
 name: do-nfs-pv
 labels:
   type: nfs-model-storage
spec:
 capacity:
 storage: 10Gi
 accessModes:
   - ReadWriteMany
 persistentVolumeReclaimPolicy: Retain
 mountOptions:
   - nconnect=8
 nfs:
   server: "10.128.0.69"
   path: "/123456/38bc6f86-9927-491a-a7b5-c5627219a0d3"
```

Use `kubectl apply` to create the PV:

```
kubectl apply -f nfs-pv.yaml
```

## Create PersistentVolumeClaim

A [PersistentVolumeClaim](https://kubernetes.io/docs/concepts/storage/persistent-volumes/#persistentvolumeclaims) (PVC) is how your applications request access to the storage made available by the PV.

To provision a PVC for your NFS share, create the following config file named `nfs-pvc.yaml`. The label for the PVC must match the label for your PV to ensure that the PVC binds to the specific NFS PV. The `accessModes` must be `ReadWriteMany` to allow multiple pods to read and write to the PVC simultaneously.

`nfs-pvc.yaml`

```yaml
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
 name: do-nfs-pvc
 namespace: sammy-doks
spec:
 storageClassName: ""
 accessModes:
   - ReadWriteMany
 resources:
   requests:
     storage: 10Gi
 selector:
   matchLabels:
     type: nfs-model-storage
```

In the config file, the `storageClassName` field is set to `""`. This instructs DOKS to find a pre-existing, statically provisioned PV matching the specified PV label and links your PVC directly to your manually configured NFS share. DOKS has built-in [StorageClass](https://kubernetes.io/docs/concepts/storage/storage-classes/) options such as `do-block-storage` that dynamically provision new storage volumes when a PVC requests them. However, in this case, you have already provisioned the storage when creating the PV and therefore do not need DOKS to dynamically provision one.

Use `kubectl apply` to create the PV:

```
kubectl apply -f nfs-pvc.yaml
```

## Mount PVC in Your Workload

After your PVC is bound to the PV, you can mount it to a workload such as Deployment, Pod, Job, or DaemonSet.

The following config file demonstrates how to mount the storage to a pod and write the current date to a log file on the NFS share every 5 seconds.

To mount the volume to the pod and reference your PVC, add the `volumes` section to the specification. The `claimName` field must match the [name you specified for your PVC](#create-persistentvolumeclaim). Next, add the `volumeMounts` section where the `name` field must match the volume name you specified earlier and the `mountPath` field specifies the path where the volume will be mounted in the container’s filesystem. The `securityContext` section configures the Pod to [run as a non-root user](#run-workloads-as-non-root-users). This is required because DigitalOcean NFS shares enforce root squashing, which prevents root users from writing to the share.

`pod-with-nfs.yaml`

```yaml
apiVersion: v1
kind: Pod
metadata:
 name: nfs-test-pod
 namespace: sammy-doks
spec:
   volumes:
     - name: my-nfs-share
       persistentVolumeClaim:
         claimName: do-nfs-pvc
  containers:
    - name: my-app-container
      image: busybox
      command: ["/bin/sh", "-c", "while true; do date >> /data/test.log; sleep 5; done"]
      volumeMounts:
        - name: my-nfs-share
          mountPath: "/data"
  securityContext:
    runAsUser: 1000
    runAsGroup: 1000
```

After you apply this manifest using `kubectl apply -f pod-with-nfs.yaml`, the pod reads from and writes to its `/data` directory, with all files persisting directly on your DigitalOcean NFS Share.

## Run Workloads as a Non-Root User

DigitalOcean NFS shares enforce root squashing, a security feature that maps root user operations from NFS clients to an unprivileged user. As a result, workloads running as the root user (User ID 0) can read from the NFS share but receive permission denied errors when attempting to write to it.

To enable write access, the workload must run as a non-root user. Containers run as root unless their Dockerfile specifies otherwise. If the container in your workload specification runs as root by default, you can configure the workload to use a non-root user in the `securityContext` section of the config file. The `runAsUser` field specifies which User ID (UID) the workload runs as, and `runAsGroup` specifies the Group ID (GID). Set these fields to non-zero values that have write access to the files on the NFS share.

The following example shows the config file for a Job that processes data on an NFS share. The `securityContext` section specifies that the Job runs with UID 1000 and GID 1000:

`job-with-nfs.yaml`

```yaml
apiVersion: batch/v1
kind: Job
metadata:
  name: nfs-data-job
  namespace: sammy-doks
spec:
  template:
    spec:
      securityContext:
        runAsUser: 1000
        runAsGroup: 1000
      volumes:
        - name: my-nfs-share
          persistentVolumeClaim:
            claimName: do-nfs-pvc
      containers:
        - name: data-processor
          image: your-image:tag
          volumeMounts:
            - name: my-nfs-share
              mountPath: "/data"
      restartPolicy: OnFailure
```

## Optimize NFS Performance on GPU Nodes

GPU Droplets support jumbo frames (9000 MTU) on their VPC interface, which improves NFS throughput for large AI/ML data transfers. Because NFS mounts negotiate TCP connection parameters at mount time based on the interface’s active MTU, you must apply network tuning before mounting any NFS shares to achieve full jumbo frame throughput.

**Note**: Only GPU Droplets support jumbo frames. Setting MTU higher than 1500 on non-GPU Droplets is not supported.

### The Race Condition

When the cluster autoscaler provisions a new GPU node, both DaemonSet pods and workload pods become schedulable on the node simultaneously. If a workload pod mounts NFS before a network-tuning DaemonSet has set the MTU to 9000, the TCP connection’s MSS is negotiated at the default 1500 MTU and is never renegotiated. This means throughput stays degraded for the lifetime of that mount, even after the MTU is later increased.

The solution is a taint-based strategy: new GPU nodes join with a taint that blocks workload scheduling. A DaemonSet tolerates the taint, applies network tuning, and then removes the taint so workloads can schedule with the correct MTU already in place.

### Step 1: Configure the Startup Taint

Add the taint `node.digitalocean.com/network-not-tuned:NoSchedule` to your GPU node pool. You can do this in the DigitalOcean Control Panel under your cluster’s node pool settings, or via the API by including the taint in the node pool configuration.

Every new node in the pool, including nodes provisioned by the autoscaler, joins the cluster with this taint. Because the taint uses the `NoSchedule` effect, workload pods that do not tolerate the taint cannot be scheduled on the node. The DaemonSet deployed in the next steps tolerates this taint, applies network tuning, and then removes the taint to unblock workloads.

### Step 2: Deploy RBAC Resources

The network tuning DaemonSet needs permission to remove taints from nodes. Create a ServiceAccount, ClusterRole, and ClusterRoleBinding to grant these permissions.

Create the following config file named `gpu-network-tuner-rbac.yaml`:

`gpu-network-tuner-rbac.yaml`

```yaml
apiVersion: v1
kind: ServiceAccount
metadata:
  name: gpu-network-tuner
  namespace: kube-system
---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRole
metadata:
  name: gpu-network-tuner
rules:
- apiGroups: [""]
  resources: ["nodes"]
  verbs: ["get", "patch"]
---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRoleBinding
metadata:
  name: gpu-network-tuner
roleRef:
  apiGroup: rbac.authorization.k8s.io
  kind: ClusterRole
  name: gpu-network-tuner
subjects:
- kind: ServiceAccount
  name: gpu-network-tuner
  namespace: kube-system
```

Use `kubectl apply` to create the RBAC resources:

```shell
kubectl apply -f gpu-network-tuner-rbac.yaml
```

### Step 3: Deploy the Network Tuning DaemonSet

The network tuning DaemonSet runs two init containers on each GPU node before the main pause container:

- **network-tuner**: Runs as a privileged container with host networking. It sets the VPC interface MTU to 9000 via netplan, applies TCP buffer sysctl parameters (`rmem_max`, `wmem_max`, `tcp_rmem`, `tcp_wmem`), and persists both settings so they survive reboots.
- **remove-taint**: Uses the host’s `kubectl` binary to remove the `network-not-tuned` taint from the node, allowing workload pods to schedule.

Create the following config file named `gpu-network-tuner.yaml`:

`gpu-network-tuner.yaml`

```yaml
apiVersion: v1
kind: ServiceAccount
metadata:
  name: gpu-network-tuner
  namespace: kube-system
---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRole
metadata:
  name: gpu-network-tuner
rules:
- apiGroups: [""]
  resources: ["nodes"]
  verbs: ["get", "patch"]
---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRoleBinding
metadata:
  name: gpu-network-tuner
roleRef:
  apiGroup: rbac.authorization.k8s.io
  kind: ClusterRole
  name: gpu-network-tuner
subjects:
- kind: ServiceAccount
  name: gpu-network-tuner
  namespace: kube-system
---
apiVersion: apps/v1
kind: DaemonSet
metadata:
  name: gpu-network-tuner
  namespace: kube-system
  labels:
    app: gpu-network-tuner
spec:
  selector:
    matchLabels:
      app: gpu-network-tuner
  template:
    metadata:
      labels:
        app: gpu-network-tuner
    spec:
      affinity:
        nodeAffinity:
          requiredDuringSchedulingIgnoredDuringExecution:
            nodeSelectorTerms:
            - matchExpressions:
              - key: doks.digitalocean.com/gpu-brand
                operator: In
                values:
                - amd
                - nvidia
      tolerations:
      - key: amd.com/gpu
        operator: Exists
        effect: NoSchedule
      - key: nvidia.com/gpu
        operator: Exists
        effect: NoSchedule
      # Tolerate the custom network-not-ready taint
      - key: node.digitalocean.com/network-not-tuned
        operator: Exists
        effect: NoSchedule
      hostNetwork: true
      hostPID: true
      serviceAccountName: gpu-network-tuner
      volumes:
      - name: host-kubectl
        hostPath:
          path: /usr/bin/kubectl
          type: File
      initContainers:
      - name: network-tuner
        image: busybox:stable
        command:
        - /bin/sh
        - -c
        - |
          set -e
          echo "=== Applying sysctl tuning ==="
          sysctl -w net.core.rmem_max=16777216
          sysctl -w net.core.wmem_max=16777216
          sysctl -w net.ipv4.tcp_rmem="4096 87380 16777216"
          sysctl -w net.ipv4.tcp_wmem="4096 65536 16777216"

          echo "=== Persisting sysctl settings ==="
          cat <<'SYSCTL' | nsenter -t 1 -m -- tee /etc/sysctl.d/99-gpu-network-tuning.conf > /dev/null
          net.core.rmem_max=16777216
          net.core.wmem_max=16777216
          net.ipv4.tcp_rmem=4096 87380 16777216
          net.ipv4.tcp_wmem=4096 65536 16777216
          SYSCTL

          echo "=== Persisting MTU 9000 for VPC interface via netplan ==="
          nsenter -t 1 -m -- sed -i '/set-name.*eth1/{n;s/mtu: 1500/mtu: 9000/}' /etc/netplan/50-cloud-init.yaml
          echo "=== Applying netplan ==="
          nsenter -t 1 -m -- netplan apply
          echo "=== Fallback: setting MTU directly ==="
          ip link set eth1 mtu 9000 || true

          echo "=== Verifying settings ==="
          sysctl net.core.rmem_max net.core.wmem_max net.ipv4.tcp_rmem net.ipv4.tcp_wmem
          ip link show eth1 | grep mtu
          echo "=== Network tuning complete ==="
        securityContext:
          privileged: true
      - name: remove-taint
        image: busybox:stable
        command: ["/bin/sh", "-c"]
        args: ["/host-bin/kubectl taint nodes $(NODE_NAME) node.digitalocean.com/network-not-tuned:NoSchedule- || true"]
        volumeMounts:
        - name: host-kubectl
          mountPath: /host-bin/kubectl
          readOnly: true
        env:
        - name: NODE_NAME
          valueFrom:
            fieldRef:
              fieldPath: spec.nodeName
      containers:
      - name: pause
        image: busybox:stable
        command: ["sleep", "infinity"]
        resources:
          requests:
            cpu: 1m
            memory: 1Mi
          limits:
            cpu: 10m
            memory: 10Mi
```

Use `kubectl apply` to deploy the DaemonSet:

```shell
kubectl apply -f gpu-network-tuner.yaml
```

Once deployed, every new GPU node (including autoscaler-provisioned nodes) goes through this sequence: the node joins with the `network-not-tuned` taint, the DaemonSet’s init containers apply MTU and sysctl tuning, the taint is removed, and then workload pods can schedule and mount NFS with the optimized network settings already in place.