GPU Worker Nodes Early Availability

DigitalOcean Kubernetes (DOKS) is a managed Kubernetes service. Deploy Kubernetes clusters with a fully managed control plane, high availability, autoscaling, and native integration with DigitalOcean Load Balancers and volumes. DOKS clusters are compatible with standard Kubernetes toolchains and the DigitalOcean API and CLI.


GPU worker nodes are now in early availability for select DOKS users. You can either create a new cluster or add a GPU node pool on an existing cluster on versions 1.30.4-do.0, 1.29.8-do.0, 1.28.13-do.0, and later.

GPU worker nodes are built on GPU Droplets, which are powered by NVIDIA’s H100 GPUs.

Using GPU worker nodes in your cluster, you can:

  • Experiment and develop AI/ML applications in containerized environments
  • Run distributed AI workloads on Kubernetes
  • Scale AI inference services

You do not need to specify a Runtime Class to run GPU workloads, which makes it easier to set up GPU workloads.

GPU Droplet Details

GPU Droplets have NVIDIA H100 GPUs in either a single GPU or 8 GPU configuration, and come with two different kinds of storage:

  • Boot disk: A local, persistent disk on the Droplet to store data for software like the operating system and ML frameworks.

  • Scratch disk: A local, non-persistent disk to store data for staging purposes, like inference and training.

The following table summarizes additional specifications:

GPU Vendor NVIDIA H100 NVIDIA H100x8
GPUs per Droplet 1 8
GPU Memory 80 GB 640 GB
Droplet Memory 240 GB 1920 GB
Droplet vCPUs 20 160
Local Storage: Boot Disk 720 GiB NVMe 2 TiB NVMe
Local Storage: Scratch Disk 5 TiB NVMe 40 TiB NVMe

Network Bandwidth
Maximum Speeds

10 Gbps public
25 Gbps private

10 Gbps public
25 Gbps private
Slug gpu-h100x1-80gb gpu-h100x8-640gb

Pricing

GPU worker nodes are priced per second at the same price as the GPU Droplets. See DOKS node pools pricing to learn more about pricing.

For reservation and contract pricing, contact your sales representative or Customer Success Manager, or send a request using the H100 GPU Worker Nodes form.

Availability

GPU worker nodes for DOKS are currently available in DigitalOcean’s TOR1 datacenter. We plan to support additional datacenters in the near future.

Limits

  • You cannot currently scale the GPU node pools down to zero.

  • You need to monitor GPU usage and manage scaling manually.

  • We do not currently support creating or adding GPU worker nodes using the DigitalOcean Control Panel.

Getting Started

Add a GPU Worker Node Using Automation

You can use the DigitalOcean CLI or API to create a new cluster or add a GPU node pool on an existing cluster.

Note
In rare cases, it can take several hours for a GPU Droplet to provision. If you have an unusually long creation time, open a support ticket.

To create a cluster with a GPU worker node, run doctl kubernetes cluster create specifying the GPU machine type. The following example creates a cluster with a worker node in single GPU configuration with 80 GB of memory and 3 node pools:

doctl kubernetes cluster create gpu--cluster --region tor1 --version 1.30.4-do.0 --node-pool "name=gpu-worker-pool;size=gpu-h100x1-80gb;count=3"

To add a GPU worker node to an existing cluster, run doctl kubernetes cluster node-pool create specifying the GPU machine type. The following example adds a GPU worker node in single GPU configuration with 80 GB of memory and 4 node pools to a cluster named gpu-cluster:

doctl kubernetes cluster node-pool create gpu-cluster --name gpu-worker-pool-1 --size gpu-h100x1-80gb --count 4

To create a cluster with a GPU worker node, send a POST request to https://api.digitalocean.com/v2/kubernetes/clusters with the following request body:

curl --location 'https://api.digitalocean.com/v2/kubernetes/clusters' \
--header 'Content-Type: application/json' \
--header 'Authorization: Bearer $DIGITALOCEAN_TOKEN' \
--data '{
    "name": "gpu--cluster",
    "region": "tor1",
    "version": "1.30.4-do.0",
    "node_pools": [
        {
            "size": "gpu-h100x1-80gb",
            "count": 3,
            "name": "gpu-worker-pool"
        }
    ]
}'

This creates a cluster with a worker node in single GPU configuration with 80 GB of memory and 3 node pools.

To add a GPU worker node to an existing cluster, send a POST request to https://api.digitalocean.com/v2/kubernetes/clusters with the following request body:

curl --location --request POST 'https://api.digitalocean.com/v2/kubernetes/clusters/{cluster_id}/node_pools' \
--header 'Content-Type: application/json' \
--header 'Authorization: Bearer $DIGITALOCEAN_TOKEN' \
--data '{
    "node_pools": [
        {
            "size": "gpu-h100x1-80gb",
            "count": 4,
            "name": "new-gpu-worker-pool"
        }
}',

This adds a GPU worker node in single GPU configuration with 80 GB of memory and 4 node pools to an existing cluster specified by its cluster ID cluster_id.

DigitalOcean applies the following labels and taint to the GPU worker nodes:

Label Taint

doks.digitalocean.com/gpu-brand: nvidia
doks.digitalocean.com/gpu-model: h100
nvidia.com/gpu:NoSchedule

You can use the labels and taint in your workload specification to schedule pods that match as shown next.

Run a GPU Workload

Once you have a GPU worker node on your cluster, you can run GPU workloads. You can use a configuration spec, similar to the sample pod spec, for your actual workloads. The following spec shows how to create a pod that runs NVIDIA’s CUDA image and uses the labels and taint for GPU worker nodes:

    
        
            
apiVersion: v1
kind: Pod
metadata:
  name: gpu-workload
spec:
  restartPolicy: Never
  nodeSelector:
    doks.digitalocean.com/gpu-brand: nvidia
  containers:
    - name: cuda-container
      image: nvcr.io/nvidia/k8s/cuda-sample:vectoradd-cuda10.2
  tolerations:
    - key: nvidia.com/gpu
      operator: Exists
      effect: NoSchedule

        
    

Optional Additional Configuration

Feedback

For support or troubleshooting, open a support ticket.

For feedback or questions about the GPU worker nodes offering, contact your account representative.