# How to Enable or Disable NVLink Machines are Linux and Windows virtual machines with persistent storage, GPU options, and free unlimited bandwidth. They’re designed for high-performance computing (HPC) workloads. [NVLink](https://www.nvidia.com/en-us/data-center/nvlink/) is a high-speed GPU interconnect developed by NVIDIA. NVLink improves data transfer speeds and scalability for high-performance computing tasks across multiple GPUs. ## When to Enable or Disable NVLink How you handle NVLink on your machine depends on the type of machine you’re using: - Machines using H100x8 GPUs come with NVLink enabled. You don’t need to enable NVLink manually. - Machines using H100x1 or A100-80Gx1 GPUs come with NVLink enabled. You need to [disable NVLink](#disable-nvlink) in order to run CUDA. You also need [NVIDIA drivers](#cuda-drivers) and [the NVIDIA CUDA toolkit](#toolkit) installed as described in this article, but do not need Fabric Manager. - SSH-only machines do not come with NVLink enabled. You can choose to manually enable NVLink. **Tip**: As an alternative to manually enabling NVLink, which can be complex and error-prone, we recommend creating machines using the [the ML-in-a-Box template](https://docs.digitalocean.com/products/paperspace/machines/getting-started/run-ml-in-a-box/index.html.md) instead. ML-in-a-Box provides the data science stack needed for HPC tasks without the need to manually configure NVLink. For improved performance on ML-in-a-Box machines, we recommend disabling the desktop environment. To do so, change the default startup target from a graphical interface to a non-graphical interface, and then reboot the machine: ```bash sudo systemctl set-default multi-user.target sudo reboot ``` If your use case requires NVLink, you can manually enable it using the instructions below. ## Enable NVLink To enable NVLink, you must: 1. Update your machine’s packages and verify the compatibility of its GPUs. 2. Install the NVIDIA CUDA Toolkit. 3. Install CUDA Drivers and NVSMI. 4. Install NVIDIA Fabric Manager. ### Update Packages and Verify GPU Compatibility Before installing the software necessary to enable NVLink, update the machine’s package index and packages to the latest versions and verify that the machine’s GPUs are compatible with NVLink. First, [connect to your machine](https://docs.digitalocean.com/products/paperspace/machines/how-to/connect/index.html.md) and open a terminal. Then, update your machine’s packages. ```bash sudo apt-get update && apt-get upgrade -y ``` Next, identify your machine’s GPUs by listing the PCI devices on your machine and filtering by NVIDIA. ```bash lspci | grep NVIDIA ``` The output displays the GPU model names: ```text 00:05.0 3D controller: NVIDIA Corporation GA100 [A100 SXM4 80GB] (rev a1) 00:06.0 3D controller: NVIDIA Corporation GA100 [A100 SXM4 80GB] (rev a1) ... ``` Search NVIDIA’s website for the GPU model and confirm that the NVLink is listed as a supported interconnect. For example, [NVIDIA’s page on the A100 GPU](https://www.nvidia.com/en-us/data-center/a100/) includes a specifications table at the bottom of the page with a row for interconnects. ### Install the NVIDIA CUDA Toolkit NVCC, the CUDA compiler driver, compiles CUDA code into executable programs. Installing the NVIDIA CUDA Toolkit lets you use NVCC and other CUDA tools to develop and run CUDA applications. Use the [CUDA Toolkit and driver compatibility table](https://docs.nvidia.com/cuda/cuda-toolkit-release-notes/index.html#id6) to find the right version for your machine. [Download the repository pinning file](https://developer.nvidia.com/cuda-downloads) with the appropriate version and move it into the APT preferences directory, which handles package priorities. For example, on Ubuntu 22.04: ```bash wget https://developer.download.nvidia.com/compute/cuda/repos/ubuntu2204/x86_64/cuda-ubuntu2204.pin sudo mv cuda-ubuntu2204.pin /etc/apt/preferences.d/cuda-repository-pin-600 ``` Similarly, download and install NVIDIA’s CUDA repository Debian package, which contains the NVIDIA CUDA Toolkit. For example, with version 12.4 on Ubuntu 22.04 ```bash wget https://developer.download.nvidia.com/compute/cuda/12.4.1/local_installers/cuda-repo-ubuntu2204-12-4-local_12.4.1-550.54.15-1_amd64.deb sudo dpkg -i cuda-repo-ubuntu2204-12-4-local_12.4.1-550.54.15-1_amd64.deb ``` Copy the repository’s GPG key to the machine’s keyring directory, which securely authenticates packages from the repository. For example, for version 12.4 on Ubuntu 22.04: ```bash sudo cp /var/cuda-repo-ubuntu2204-12-4-local/cuda-*-keyring.gpg /usr/share/keyrings/ ``` Update the machine’s package lists to incorporate the new changes and then install the NVIDIA CUDA Toolkit from the repository you added. For example, for version 12.4: ```bash sudo apt-get update sudo apt-get -y install cuda-toolkit-12-4 ``` Once you have the NVIDIA CUDA Toolkit installed on your machine, you can verify the version of NVCC with `nvcc --version`. If you installed the toolkit but the `nvcc` command isn’t found, the toolkit may not be on the PATH of your machine. ## Click to expand instructions on updating your machine’s PATH. To add the toolkit to your PATH, add the following lines to the `~/.profile` file: ```bash export PATH=/etc/alternatives/cuda/bin${PATH:+:${PATH}} export LD_LIBRARY_PATH=/etc/alternatives/cuda/lib64${LD_LIBRARY_PATH:+:${LD_LIBRARY_PATH}} ``` Save the file, and then apply the changes: ```bash source ~/.profile ``` Run the `nvcc --version` command again to confirm the fix. ### Install CUDA Drivers and NVSMI NVSMI, the NVIDIA System Management Interface, monitors and manages NVIDIA GPU devices by providing access to GPU settings, configuration details, performance, and real-time statuses. It also shows how GPUs are interconnected, either through PCIe or NVLink. The NVIDIA CUDA drivers, which are necessary to use NVLink, include NVSMI. [Install the CUDA drivers](https://docs.nvidia.com/datacenter/tesla/driver-installation-guide/index.html#nvidia-driver-installation-for-ubuntu) on your machine: ```bash sudo apt-get install -y cuda-drivers ``` You can confirm that NVSMI is also installed by running `nvidia-smi`, which outputs information about your machine’s GPUs. ### Install NVIDIA Fabric Manager NVIDIA Fabric Manager manages fabric resources, such as NVLink, and is essential for machines involving complex GPU interconnects, such as configuring and allocating NVLink connections. [Install Fabric Manager](https://docs.nvidia.com/datacenter/tesla/fabric-manager-user-guide/index.html) on your machine and start it: ```bash sudo apt-get install cuda-drivers-fabricmanager-550 -y sudo systemctl start nvidia-fabricmanager ``` ## Verify NVLink Configuration Once NVLink is enabled, you can test the connections between your machine’s GPUs and confirm that NVLink is functioning, then test the CUDA environment. ### Check the Connectivity Matrix Use NVSMI to output information about your GPUs: ```bash nvidia-smi ``` In the output, look for the NVLink GPU Peer-to-Peer Connectivity Matrix: ```text +-----------------------------------------------------------------------------+ | NVIDIA-SMI 520.56.06 Driver Version: 520.56.06 CUDA Version: 12.0 | |---+---+---+ | GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC | | Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. | | | | MIG M. | |===============================+======================+======================| | 0 NVIDIA H100 Off | 00000000:00:1A.0 Off | 0 | | N/A 32C P0 40W / 300W | 0MiB / 40960MiB | 0% Default | | | | N/A | +-------------------------------+----------------------+----------------------+ ... +-----------------------------------------------------------------------------+ | NVLink GPU Peer-to-Peer Connectivity Matrix | | | | GPU0 GPU1 | | 0 1 | | 0 X NV1 | | 1 NV1 X | +-----------------------------------------------------------------------------+ | NV1 Enabled | +-----------------------------------------------------------------------------+ ``` NVLink is enabled if you see “NV1 Enabled” or “NV2 Enabled”. ### Check the GPU Topology Next, use NVSMI to display the GPU topology, or how the GPUs are connected to each other. ```bash nvidia-smi topo -m ``` The output displays a table with connectivity details between each GPU: ```text GPU0 GPU1 GPU2 GPU3 CPU Affinity NUMA Affinity GPU0 X NV1 NV1 NV2 0-15 N/A GPU1 NV1 X NV1 NV2 0-15 N/A ... ``` The GPUs are interconnected with NVLink if you see “NV1” or “NV2” in the output. ### Check the NVLink Connections Finally, check the status of each NVLink connection for each GPU. ```bash nvidia-smi nvlink --status ``` This command outputs information about each NVLink connection, like its utilization and its active or inactive status. ```text GPU 0: NVIDIA H100 Link 0: 250 GB/s - Active Link 1: 250 GB/s - Active Link 2: 250 GB/s - Inactive Link 3: 250 GB/s - Active Link 4: 250 GB/s - Active Link 5: 250 GB/s - Active GPU 1: NVIDIA H100 Link 0: 250 GB/s - Active Link 1: 250 GB/s - Active Link 2: 250 GB/s - Active Link 3: 250 GB/s - Active Link 4: 250 GB/s - Inactive Link 5: 250 GB/s - Active ... ``` If NVLink has an inactive status on all links, then there’s an issue with the configuration. To troubleshoot, repeat the above steps for installing the necessary software and testing the GPU and NVLink connections. If going through the steps again doesn’t fix the issue, reboot the machine with `sudo reboot`. For further assistance, contact [Paperspace support](https://docs.digitalocean.com/products/paperspace/machines/support/index.html.md). ### Test the CUDA Environment After verifying and enabling NVLink, test your machine’s CUDA environment using CUDA samples, which are a collection of samples created by NVIDIA. These samples are used to configure and test CUDA Toolkit features, such as NVLink. Clone the [CUDA Samples repository](https://github.com/NVIDIA/cuda-samples). ```bash git clone https://github.com/NVIDIA/cuda-samples ``` The `deviceQuery` sample is a utility that provides information about the CUDA devices on your machine. It verifies that the system recognizes the GPUs and displays their capabilities, such as NVLink connection. Move to the `deviceQuery` directory and compile the sample using the provided Makefile: ```bash cd cuda-samples/Samples/1_Utilities/deviceQuery make ``` You can alternatively compile the sample with `nvcc -o deviceQuery deviceQuery.cu`. Finally, run the compiled program: ```bash ./deviceQuery ``` This validates that the CUDA environment is set up correctly and NVLink is connecting the GPUs within your machine. ```text Device 0: "NVIDIA H100 PCIe" ... NVLink capability: Supported P2P Access between GPUs: Yes ``` If NVLink doesn’t appear in the `deviceQuery` output, then there is an issue with either the hardware setup, the driver configuration, or software configuration. ## Disable NVLink To disable NVLink on your machine, you need to disable it both at the system level and on the RAM disk (`initrd`). ### At the System Level First, create a backup of the GRUB configuration file with the current date for identification. You can restore from this file in event of an error. ```shell sudo cp /etc/default/grub /etc/default/grub.backup_$(date +%Y-%m-%d) ``` Next, open `/etc/default/grub` with a text editor and update the `GRUB_CMDLINE_LINUX_DEFAULT` value to disable NVLink. ```shell GRUB_CMDLINE_LINUX_DEFAULT="quiet splash nvlink.disable=1" ``` Save and close the file, then update GRUB and reboot your machine. ```shell sudo update-grub sudo reboot ``` If your machine does not boot, use your GRUB backup file to restore the original configuration and try again. ## Click to expand instructions on restoring from the GRUB backup file. First, print a list of your machine’s disk partitions. ```shell sudo fdisk -l ``` Locate the partition that has the Linux filesystem. This has a `dev/sdXn` naming pattern. For example, if the device name is `/dev/sda1`, `sda` represents your hard disk and the number represents the partition number. Mount the partition where your Ubuntu system is installed, replacing `dev/sdXn` with the appropriate system partition from the previous step. ```shell sudo mount /dev/sdXn /mnt ``` Mount the necessary directories. ```shell sudo mount --bind /dev /mnt/dev sudo mount --bind /proc /mnt/proc sudo mount --bind /sys /mnt/sys sudo mount --bind /run /mnt/run ``` Change the root directory to your system’s partition. ```shell sudo chroot /mnt ``` This creates an environment, often called a `chroot` jail, where programs cannot access files outside of the directory specified. Restore the GRUB configuration file using your backup file. ```shell sudo cp $(ls -Art /etc/default/grub.backup* | tail -n 1) /etc/default/grub ``` The `*` in the command matches filenames that start with `grub.backup`, such as the filename with the backup’s creation date, in the `/etc/default` directory. Update your GRUB configuration file, exit the `chroot` jail, and reboot your machine. ```shell update-grub exit sudo reboot ``` ### On the RAM Disk Update your RAM disk to ensure NVLink is disabled when you start up your machine. Create a backup of the current `initrd` file with the current date for identification. You can restore from this file in event of an error. ```shell sudo cp /boot/initrd.img-$(uname -r) /boot/initrd.img-$(uname -r).backup_$(date +%Y-%m-%d) ``` Then, to disable NVLink, modify the `initramfs` configurations with one of the following options: - **Create a [`modprobe`](https://en.wikipedia.org/wiki/Modprobe) configuration file** that denylists NVIDIA NVLink modules: ```shell echo "blacklist " | sudo tee /etc/modprobe.d/nvlink-denylist.conf ``` This does not change `initramfs` but makes `initramfs` respect your denylist. - **Write an `initramfs-tools` script to disable NVLink**. For example, create `/etc/initramfs-tools/scripts/new-init/disable-nvlink.sh`, and enter a script like the following that disables specific NVIDIA kernel modules: ```shell #!/bin/sh modprobe -r nvidia_nvlink modprobe -r nvidia_uvm ``` Save and close the file. Then, make the script executable and rebuild `initrd` for your running kernel. ```shell sudo chmod +x /etc/initramfs-tools/scripts/new-init/disable_nvlink.sh sudo update-initramfs -u ``` To disable NVLink across all Linux kernel versions on your machine (if you have a multi-boot environment or multiple kernel versions for testing or compatibility reasons), update all installed kernels: ```shell sudo update-initramfs -c -k all ``` Finally, reboot your machine. ```shell sudo reboot ``` If your RAM disk is corrupted after a reboot, restore your RAM disk using your backup file and reboot your machine. ```shell sudo cp /boot/initrd.img-$(uname -r).backup_$(date +%Y-%m-%d) /boot/initrd.img-$(uname -r) sudo reboot ``` After the reboot, you can try again.