How to Configure Multi-Node GPU Droplets

Generated on 16 May 2025

DigitalOcean Droplets are Linux-based virtual machines (VMs) that run on top of virtualized hardware. Each Droplet you create is a new server you can use, either standalone or as part of a larger, cloud-based infrastructure.

To create multi-node GPU Droplets, you must first contact support. Only 8 GPU Droplets are multi-node capable, and support needs to enable the specific plan slug for you to use when you create your GPU Droplets.

After creation, the configuration of the network that connects the GPUs using a NCCL topology is not yet fully automated, so you then need to take some additional steps to assign IP addresses to the GPU network cards.

Configure the GPU Network Interface Controllers

The GPU’s eight network interface controllers (NICs) are eth2 to eth9.

Warning
The eth0 interface is for public connectivity to the internet and eth1 is for private connectivity to other Droplets in the same VPC network. Multi-node applications must use the eth2 to eth9 network interfaces, which are for GPU-to-GPU communication.

Each NIC must have its own subnet that is disjoint from the others. For example, eth2 could use 192.68.50.0/24, eth3 could use 192.68.51.0/24, and so on.

Each Droplet additionally needs a unique IP address on each subnet. We recommend using the same final octet in each subnet for a given Droplet. For example, one Droplet would have the addresses 192.68.50.2, 192.68.51.2, and so on. An additional Droplet would have 192.68.50.3, 192.68.51.3, and so on.

We haven’t finished automating addressing the NICs, so until then, you can address the NICs in one of two ways:

  • With user data, which is useful if you intend to use a base image that doesn’t support Netplan, but requires a specific naming convention for your Droplets.

  • Manually with Netplan, which is useful if the Droplet naming convention for the user data script is not suitable for your needs.

  • Using Ansible, which is useful if you want to apply changes to an existing set of GPU Droplets.

To use our user data script, you must adopt a specific naming convention for your Droplets:

  • The name must end with a hyphen, -, followed by an integer between 1 to 254. For example, examplename-1.
  • The name must have no other hyphens.

Then, use the following cloud-config file when you create the Droplet:

#cloud-config
write_files:
- path: /usr/sbin/gpu-fabric.sh
  content: |
    #!/bin/bash
    IFACES=$(ip -br addr | grep eth | grep -E 'eth2|eth3|eth4|eth5|eth6|eth7|eth8|eth9' | awk '{print $1}')
    subnet=50
    octet=$(hostname | cut -d '-' -f 2)
    for i in ${IFACES}; do
        /usr/sbin/ip link set dev ${i} up
        /usr/sbin/ip link set dev ${i} mtu 4200
        ADDR="192.168.${subnet}.${octet}/24"
        /usr/sbin/ip addr add dev ${i} ${ADDR}
        subnet=$((subnet + 1))
    done
    /usr/sbin/ip -br addr
  permissions: '0755'
bootcmd:
- /usr/sbin/gpu-fabric.sh
runcmd:
- /usr/sbin/gpu-fabric.sh

You can pass this script when creating a GPU Droplet with doctl by using the -user-data-file flag.

You can use Netplan to configure the NICs. The AI/ML-ready image we provide for GPU Droplets includes Netplan support.

On each Droplet, open /etc/netplan/50-cloud-init.yaml and add the following block after eth1:

eth2:
  dhcp4: false
  dhcp6: false
  link-local: []          
  addresses:
    - 192.168.50.2/24
  mtu: 4200
eth3:
  dhcp4: false
  dhcp6: false
  link-local: []
  addresses:
    - 192.168.51.2/24
  mtu: 4200
eth4:
  dhcp4: false
  dhcp6: false
  link-local: []
  addresses:
    - 192.168.52.2/24
  mtu: 4200
eth5:
  dhcp4: false
  dhcp6: false
  link-local: []
  addresses:
    - 192.168.53.2/24
  mtu: 4200
eth6:
  dhcp4: false
  dhcp6: false
  link-local: []
  addresses:
    - 192.168.54.2/24
  mtu: 4200
eth7:
  dhcp4: false
  dhcp6: false
  link-local: []
  addresses:
    - 192.168.55.2/24
  mtu: 4200 
eth8:
  dhcp4: false
  dhcp6: false
  link-local: []
  addresses:
    - 192.168.56.2/24
  mtu: 4200
eth9:
  dhcp4: false
  dhcp6: false
  link-local: []
  addresses:
    - 192.168.57.2/24
  mtu: 4200

You can optionally also edit the eth1 MTU to 9002.

Save the file and apply the changes:

sudo netplan apply

Repeat this process on every other Droplet, replacing the fourth octet each time. For example, change 192.168.50.2 to 192.168.50.3 on the next Droplet, then to 192.168.50.4 on the next, and so on.

You can use our gpu-fabric Ansible playbook to configure multi-node GPU Droplets:

DigitalOcean's gpu-fabric GitHub Repository

A simple Ansible playbook to configure multi-node GPU Droplets.

github.com

The README of the repository has installation and usage instructions which are replicated here:

This content is automatically generated from https://github.com/digitalocean/gpu-fabric/blob/main/README.md.

This repository contains a simple Ansible playbook to configure multi-node GPU Droplets.

To use this playbook:

  1. On the machine that you will use to run this playbook, first install Ansible and then clone this repository.

  2. In the inventory/droplets file in your cloned version of this repository, in the [multinode_gpu_droplets] section, specify the public IP addresses of your GPU Droplets.

  3. Ansible uses SSH under the hood to configure Droplets. If you have never connected to your Droplets with SSH and the .ssh/config file on your machine does not include StrictHostKeyChecking no, add the following line to the inventory/droplets file:

ansible_ssh_common_args='-o StrictHostKeyChecking=no'
  1. Save the file, then run the playbook from the root of the repository:
ansible-playbook -i inventory/droplets customer-play.yaml

The output of a successful run looks similar to the following:

PLAY [multinode_gpu_droplets] ***********************************************************************************

TASK [Gathering Facts] ******************************************************************************************
ok: [10.10.10.10]

TASK [read /etc/netplan/50-cloud-init.yaml] *********************************************************************
ok: [10.10.10.10]

TASK [extract /etc/netplan/50-cloud-init.yaml] ******************************************************************
ok: [10.10.10.10]

TASK [set a unique index for each droplet] **********************************************************************
ok: [10.10.10.10] => (item=10.10.10.10)

TASK [adjust /etc/netplan/50-cloud-init.yaml] *******************************************************************
ok: [10.10.10.10]

TASK [write /etc/netplan/50-cloud-init.yaml] ********************************************************************
ok: [10.10.10.10]

TASK [install lldp] *********************************************************************************************
ok: [10.10.10.10]

PLAY RECAP ******************************************************************************************************
10.10.10.10             : ok=7    changed=0    unreachable=0    failed=0    skipped=0    rescued=0    ignored=0

Verify Connectivity

First, verify that the NICs are on the same VLAN across all Droplets.

Install lldpd, a utility to locate network neighbors, if you don’t already have it installed. For example, on Debian-based distributions, including the Ubuntu AI/ML-ready image, use APT:

apt install lldpd --yes

Use llpdctl to display information about the neighbors on all interfaces, and filter by VLAN:

lldpctl  | grep -i vlan

It should display one VLAN per NIC:

  VLAN:         4, pvid: yes vlan-4
  VLAN:         4, pvid: yes vlan-4
  VLAN:         4, pvid: yes vlan-4
  VLAN:         4, pvid: yes vlan-4
  VLAN:         4, pvid: yes vlan-4
  VLAN:         4, pvid: yes vlan-4
  VLAN:         4, pvid: yes vlan-4
  VLAN:         4, pvid: yes vlan-4

You can also check the IP addresses assigned to the fabric NICs:

ip -br a

This lists the network interfaces and their IP addresses:

lo               UNKNOWN        127.0.0.1/8 ::1/128 
eth0             UP             162.243.220.179/24 10.13.0.5/16 fe80::4006:aff:fe4d:d7cb/64 
eth1             UP             10.128.0.2/16 
eth2             UP             192.168.50.1/24 
eth3             UP             192.168.51.1/24 
eth4             UP             192.168.52.1/24 
eth5             UP             192.168.53.1/24 
eth6             UP             192.168.54.1/24
eth7             UP             192.168.55.1/24 
eth8             UP             192.168.56.1/24 
eth9             UP             192.168.57.1/24

Make sure these match the addresses you assigned.

Download the Topology File

For the best performance with multi-node training using NCCL, you must provide a topology file.

First, download the topology file and save it as /etc/nccl/topo.xml.

Then, edit /etc/nccl.conf and include the following line:

NCCL_TOPO_FILE=/etc/nccl/topo.xml

Repeat this process on all of the Droplets in your configuration.

We can't find any results for your search.

Try using different keywords or simplifying your search terms.