GPU Worker Nodes
Validated on 12 Jun 2025 • Last edited on 12 Jun 2025
DigitalOcean Kubernetes (DOKS) is a Kubernetes service with a fully managed control plane, high availability, and autoscaling. DOKS integrates with standard Kubernetes toolchains and DigitalOcean’s load balancers, volumes, CPU and GPU Droplets, API, and CLI.
GPU worker nodes are built on GPU Droplets, which are powered by NVIDIA and AMD GPUs. Using GPU worker nodes in your cluster, you can:
- Experiment and develop AI/ML applications in containerized environments
- Run distributed AI workloads on Kubernetes
- Scale AI inference services
We offer the following GPU options:
GPU | Configuration | Slug | Additional Information |
---|---|---|---|
NVIDIA H100 | Single GPU or 8 GPU |
|
For multiple 8 GPU H100 worker nodes, we support high-speed networking between GPUs from different nodes. High speed communication uses 8x Mellanox 400GbE interfaces. To enable this, submit the H100 multi-node setup form. |
NVIDIA L40s | Single GPU | gpu-l40sx1-48gb |
|
NVIDIA RTX 4000 | Single GPU | gpu-4000adax1-20gb |
|
NVIDIA RTX 6000 | Single GPU | gpu-6000adax1-48gb |
|
AMD Instinct MI300X | Single GPU or 8 GPU |
|
You do not need to specify a Runtime Class to run GPU workloads, which makes it easier for you to set up GPU workloads. DigitalOcean installs and manages the drivers required to enable GPU worker nodes support:
GPU | Drivers | Additional Recommended Device Plugins |
---|---|---|
NVIDIA | NVIDIA CUDA drivers, NVIDIA CUDA Toolkit, and NVIDIA Container Toolkit. For latest versions installed, see DOKS changelog. | For GPU discovery, health checks, configuration of GPU-enabled containers, and time slicing, you need an additional component called NVIDIA device plugin for Kubernetes. You can configure the plugin and deploy it using For monitoring your cluster using Prometheus, you need to install NVIDIA DCGM Exporter. |
AMD | AMDGPU driver and AMD ROCm. For latest versions installed, see DOKS changelog. | For GPU discovery, health checks, configuration of GPU-enabled containers, and time slicing, you need an additional component called ROCm Device Plugin for Kubernetes. You can configure the plugin and deploy it, as described in the README file of the GitHub repository. |
DigitalOcean applies additional labels to the GPU worker nodes. For more information, see Automatic Application of Labels and Taints to Nodes.
You can also use the cluster autoscaler to automatically scale the GPU node pool down to zero, or use the DigitalOcean CLI or API to manually scale the node pool down to 0. Autoscaling is useful when using on-demand and for jobs like training and fine-tuning.