GPU Worker Nodes

Validated on 12 Jun 2025 • Last edited on 12 Jun 2025

DigitalOcean Kubernetes (DOKS) is a Kubernetes service with a fully managed control plane, high availability, and autoscaling. DOKS integrates with standard Kubernetes toolchains and DigitalOcean’s load balancers, volumes, CPU and GPU Droplets, API, and CLI.

GPU worker nodes are built on GPU Droplets, which are powered by NVIDIA and AMD GPUs. Using GPU worker nodes in your cluster, you can:

  • Experiment and develop AI/ML applications in containerized environments
  • Run distributed AI workloads on Kubernetes
  • Scale AI inference services

We offer the following GPU options:

GPU Configuration Slug Additional Information
NVIDIA H100 Single GPU or 8 GPU

gpu-h100x1-80gb for single GPU configuration

gpu-h100x8-640gb for 8 GPU configuration

For multiple 8 GPU H100 worker nodes, we support high-speed networking between GPUs from different nodes. High speed communication uses 8x Mellanox 400GbE interfaces. To enable this, submit the H100 multi-node setup form.
NVIDIA L40s Single GPU gpu-l40sx1-48gb
NVIDIA RTX 4000 Single GPU gpu-4000adax1-20gb
NVIDIA RTX 6000 Single GPU gpu-6000adax1-48gb
AMD Instinct MI300X Single GPU or 8 GPU

gpu-mi300x1-192gb for single GPU configuration

gpu-mi300x8-1536gb for 8 GPU configuration

You do not need to specify a Runtime Class to run GPU workloads, which makes it easier for you to set up GPU workloads. DigitalOcean installs and manages the drivers required to enable GPU worker nodes support:

GPU Drivers Additional Recommended Device Plugins
NVIDIA NVIDIA CUDA drivers, NVIDIA CUDA Toolkit, and NVIDIA Container Toolkit. For latest versions installed, see DOKS changelog.

For GPU discovery, health checks, configuration of GPU-enabled containers, and time slicing, you need an additional component called NVIDIA device plugin for Kubernetes. You can configure the plugin and deploy it using helm, as described in the README file of the GitHub repository.

For monitoring your cluster using Prometheus, you need to install NVIDIA DCGM Exporter.

AMD AMDGPU driver and AMD ROCm. For latest versions installed, see DOKS changelog. For GPU discovery, health checks, configuration of GPU-enabled containers, and time slicing, you need an additional component called ROCm Device Plugin for Kubernetes. You can configure the plugin and deploy it, as described in the README file of the GitHub repository.

DigitalOcean applies additional labels to the GPU worker nodes. For more information, see Automatic Application of Labels and Taints to Nodes.

You can also use the cluster autoscaler to automatically scale the GPU node pool down to zero, or use the DigitalOcean CLI or API to manually scale the node pool down to 0. Autoscaling is useful when using on-demand and for jobs like training and fine-tuning.

We can't find any results for your search.

Try using different keywords or simplifying your search terms.