GPU Worker Nodes
Validated on 12 Jun 2025 • Last edited on 2 Aug 2025
DigitalOcean Kubernetes (DOKS) is a Kubernetes service with a fully managed control plane, high availability, and autoscaling. DOKS integrates with standard Kubernetes toolchains and DigitalOcean’s load balancers, volumes, CPU and GPU Droplets, API, and CLI.
GPU worker nodes are built on GPU Droplets, which are powered by NVIDIA and AMD GPUs. Using GPU worker nodes in your cluster, you can:
- Experiment and develop AI/ML applications in containerized environments
- Run distributed AI workloads on Kubernetes
- Scale AI inference services
Available GPU Node Pools
We offer the following GPU options for creating node pools:
NVIDIA GPU | Slug |
---|---|
H100 | gpu-h100x1-80gb |
H100 (8x) | gpu-h100x8-640gb |
L40s | gpu-l40sx1-48gb |
RTX 4000 | gpu-4000adax1-20gb |
RTX 6000 | gpu-6000adax1-48gb |
AMD GPU | Slug |
---|---|
Instinct MI300X | gpu-mi300x1-192gb |
Instinct MI300X (8x) | gpu-mi300x8-1536gb |
Instinct MI325X | gpu-mi325x1-256gb |
Instinct MI325X (8x) | gpu-mi325x8-2048gb |
Runtime, Drivers, and Plugins
To run GPU workloads, you do not need to specify a Runtime Class.
DigitalOcean also installs and manages the required drivers to enable the GPU worker nodes as described below. For the latest versions installed, see the DOKS changelog.
NVIDIA GPUs
For NVIDIA GPUs, we install the following drivers:
We also recommend the following additional software:
-
NVIDIA device plugin for Kubernetes for GPU discovery, health checks, configuration of GPU-enabled containers, and time slicing.
You can configure the plugin and deploy it using
helm
as described in theREADME
file of the GitHub repository. -
NVIDIA DCGM Exporter for monitoring your cluster using Prometheus.
AMD GPUs
For AMD GPUs, we install the following drivers:
We also recommend the following additional software:
-
ROCm Device Plugin for Kubernetes for GPU discovery, health checks, configuration of GPU-enabled containers, and time slicing.
We automatically deploy this component when you create or update a cluster. You can turn this option off by setting
amd_gpu_device_plugin
tofalse
in the request body when creating or updating a cluster using the API. -
AMD Device Metrics Exporter for ingesting GPU metrics into your monitoring system.
You can install this plugin by setting
amd_gpu_device_metrics_exporter_plugin
totrue
in the request body when creating or updating a cluster using the API. The plugin is installed in thekube-system
namespace of the Kubernetes cluster.
Additional Features
DigitalOcean applies additional labels and taints to the GPU worker nodes. For more information, see Automatic Application of Labels and Taints to Nodes.
You can also use the cluster autoscaler to automatically scale the GPU node pool down to zero, or use the DigitalOcean CLI or API to manually scale the node pool down to 0. Autoscaling is useful when using on-demand and for jobs like training and fine-tuning.