GPU Worker Nodes

Validated on 12 Jun 2025 • Last edited on 2 Aug 2025

DigitalOcean Kubernetes (DOKS) is a Kubernetes service with a fully managed control plane, high availability, and autoscaling. DOKS integrates with standard Kubernetes toolchains and DigitalOcean’s load balancers, volumes, CPU and GPU Droplets, API, and CLI.

GPU worker nodes are built on GPU Droplets, which are powered by NVIDIA and AMD GPUs. Using GPU worker nodes in your cluster, you can:

  • Experiment and develop AI/ML applications in containerized environments
  • Run distributed AI workloads on Kubernetes
  • Scale AI inference services

Available GPU Node Pools

We offer the following GPU options for creating node pools:

NVIDIA GPU Slug
H100 gpu-h100x1-80gb
H100 (8x) gpu-h100x8-640gb
L40s gpu-l40sx1-48gb
RTX 4000 gpu-4000adax1-20gb
RTX 6000 gpu-6000adax1-48gb
Note
For multiple 8 GPU H100 worker nodes, we support high-speed networking between GPUs from different nodes. High speed communication uses 8x Mellanox 400GbE interfaces. To enable this, submit the H100 multi-node setup form.
AMD GPU Slug
Instinct MI300X gpu-mi300x1-192gb
Instinct MI300X (8x) gpu-mi300x8-1536gb
Instinct MI325X gpu-mi325x1-256gb
Instinct MI325X (8x) gpu-mi325x8-2048gb

Runtime, Drivers, and Plugins

To run GPU workloads, you do not need to specify a Runtime Class.

DigitalOcean also installs and manages the required drivers to enable the GPU worker nodes as described below. For the latest versions installed, see the DOKS changelog.

NVIDIA GPUs

For NVIDIA GPUs, we install the following drivers:

We also recommend the following additional software:

AMD GPUs

For AMD GPUs, we install the following drivers:

We also recommend the following additional software:

  • ROCm Device Plugin for Kubernetes for GPU discovery, health checks, configuration of GPU-enabled containers, and time slicing.

    We automatically deploy this component when you create or update a cluster. You can turn this option off by setting amd_gpu_device_plugin to false in the request body when creating or updating a cluster using the API.

  • AMD Device Metrics Exporter for ingesting GPU metrics into your monitoring system.

    You can install this plugin by setting amd_gpu_device_metrics_exporter_plugin to true in the request body when creating or updating a cluster using the API. The plugin is installed in the kube-system namespace of the Kubernetes cluster.

Additional Features

DigitalOcean applies additional labels and taints to the GPU worker nodes. For more information, see Automatic Application of Labels and Taints to Nodes.

You can also use the cluster autoscaler to automatically scale the GPU node pool down to zero, or use the DigitalOcean CLI or API to manually scale the node pool down to 0. Autoscaling is useful when using on-demand and for jobs like training and fine-tuning.

We can't find any results for your search.

Try using different keywords or simplifying your search terms.