Give Feedback

GPU Worker Nodes

Validated on 12 Jun 2025 • Last edited on 12 Jun 2025

DigitalOcean Kubernetes (DOKS) is a Kubernetes service with a fully managed control plane, high availability, and autoscaling. DOKS integrates with standard Kubernetes toolchains and DigitalOcean’s load balancers, volumes, CPU and GPU Droplets, API, and CLI.

GPU worker nodes are built on GPU Droplets, which are powered by NVIDIA and AMD GPUs. Using GPU worker nodes in your cluster, you can:

Experiment and develop AI/ML applications in containerized environments
Run distributed AI workloads on Kubernetes
Scale AI inference services

We offer the following GPU options:

GPU	Configuration	Slug	Additional Information
NVIDIA H100	Single GPU or 8 GPU	`gpu-h100x1-80gb` for single GPU configuration `gpu-h100x8-640gb` for 8 GPU configuration	For multiple 8 GPU H100 worker nodes, we support high-speed networking between GPUs from different nodes. High speed communication uses 8x Mellanox 400GbE interfaces. To enable this, submit the H100 multi-node setup form.
NVIDIA L40s	Single GPU	`gpu-l40sx1-48gb`
NVIDIA RTX 4000	Single GPU	`gpu-4000adax1-20gb`
NVIDIA RTX 6000	Single GPU	`gpu-6000adax1-48gb`
AMD Instinct MI300X	Single GPU or 8 GPU	`gpu-mi300x1-192gb` for single GPU configuration `gpu-mi300x8-1536gb` for 8 GPU configuration

You do not need to specify a Runtime Class to run GPU workloads, which makes it easier for you to set up GPU workloads. DigitalOcean installs and manages the drivers required to enable GPU worker nodes support:

GPU Drivers Additional Recommended Device Plugins

NVIDIA

GPU	Drivers	Additional Recommended Device Plugins
NVIDIA	NVIDIA CUDA drivers, NVIDIA CUDA Toolkit, and NVIDIA Container Toolkit. For latest versions installed, see DOKS changelog.	For GPU discovery, health checks, configuration of GPU-enabled containers, and time slicing, you need an additional component called NVIDIA device plugin for Kubernetes. You can configure the plugin and deploy it using `helm`, as described in the `README` file of the GitHub repository. For monitoring your cluster using Prometheus, you need to install NVIDIA DCGM Exporter.
AMD	AMDGPU driver and AMD ROCm. For latest versions installed, see DOKS changelog.	For GPU discovery, health checks, configuration of GPU-enabled containers, and time slicing, you need an additional component called ROCm Device Plugin for Kubernetes. You can configure the plugin and deploy it, as described in the `README` file of the GitHub repository.

NVIDIA CUDA drivers, NVIDIA CUDA Toolkit, and NVIDIA Container Toolkit. For latest versions installed, see DOKS changelog.

For GPU discovery, health checks, configuration of GPU-enabled containers, and time slicing, you need an additional component called NVIDIA device plugin for Kubernetes. You can configure the plugin and deploy it using helm, as described in the README file of the GitHub repository.

For monitoring your cluster using Prometheus, you need to install NVIDIA DCGM Exporter.

AMD AMDGPU driver and AMD ROCm. For latest versions installed, see DOKS changelog. For GPU discovery, health checks, configuration of GPU-enabled containers, and time slicing, you need an additional component called ROCm Device Plugin for Kubernetes. You can configure the plugin and deploy it, as described in the README file of the GitHub repository.

DigitalOcean applies additional labels to the GPU worker nodes. For more information, see Automatic Application of Labels and Taints to Nodes.

You can also use the cluster autoscaler to automatically scale the GPU node pool down to zero, or use the DigitalOcean CLI or API to manually scale the node pool down to 0. Autoscaling is useful when using on-demand and for jobs like training and fine-tuning.

GPU Worker Nodes

We can't find any results for your search.