DigitalOcean Kubernetes (DOKS) is a managed Kubernetes service. Deploy Kubernetes clusters with a fully managed control plane, high availability, autoscaling, and native integration with DigitalOcean Load Balancers and volumes. You can add node pools using shared and dedicated CPUs, and NVIDIA H100 GPUs in a single GPU or 8 GPU configuration. DOKS clusters are compatible with standard Kubernetes toolchains and the DigitalOcean API and CLI.
A node pool is a group of nodes in a DOKS cluster with the same configuration.
All the worker nodes within a node pool have identical resources, but each node pool can have a different worker configuration. This lets you have different services on different node pools, where each pool has the RAM, CPU, and attached storage resources the service requires.
You can create and modify node pools at any time. Worker nodes are automatically deleted and recreated when needed, and you can manually recycle worker nodes. Nodes in the node pool will inherit the node pool’s naming scheme when you first create a node pool, however, renaming a node pool will not rename the nodes. Nodes inherit the new naming scheme when recycled or when resizing the node pool which creates new nodes.
You can add custom tags to a cluster and its node pools. DOKS deletes any custom tags you added to worker nodes in a node pool (for example, from the Droplets page) to maintain consistency between the node pool and its worker nodes.
You can also add a GPU node pool to an existing cluster on versions 1.30.4-do.0, 1.29.8-do.0, 1.28.13-do.0, and later.
To run GPU workloads after you create a cluster, use the GPU nodes-specific labels and taint in your workload specifications to schedule pods that match. You can use a configuration spec, similar to the pod spec shown below, for your actual workloads:
apiVersion: v1
kind: Pod
metadata:
name: gpu-workload
spec:
restartPolicy: Never
nodeSelector:
doks.digitalocean.com/gpu-brand: nvidia
containers:
- name: cuda-container
image: nvcr.io/nvidia/k8s/cuda-sample:vectoradd-cuda10.2
tolerations:
- key: nvidia.com/gpu
operator: Exists
effect: NoSchedule
The above spec shows how to create a pod that runs NVIDIA’s CUDA image and uses the labels and taint for GPU worker nodes.
You can use the cluster autoscaler to scale the GPU node pool down to 1 or use the DigitalOcean CLI or API to manually scale the node pool down to 0. For example:
doctl kubernetes cluster node-pool update <your-cluster-id> <your-nodepool-id> --count 0
To add additional node pools to an existing cluster: