Use the ML-in-a-Box Template For Machine Learning Applications

Machines are Linux and Windows virtual machines with persistent storage, GPU options, and free unlimited bandwidth. They’re designed for high-performance computing (HPC) workloads.


ML-in-a-Box is a Linux-based machine template with a pre-installed data science stack, including popular tools like PyTorch, TensorFlow, Hugging Face Transformers, Deepspeed, JupyterLab, NumPy, Pandas, XGBoost, and Scikit-learn. It comes with full machine learning support (CUDA, cuDNN, NVIDIA Docker). You can use this template to run machine learning software, develop and train models, or install additional tools as needed.

ML-in-a-Box includes several directory-specific commands located in /usr/bin, /usr/local/bin, /home/paperspace/.local/bin, and /usr/local/cuda/bin, allowing you to access and run machine learning software or install additional tools. These directories are already included in your system’s PATH, so no manual configuration is needed.

ML-in-a-Box also has Python libraries commonly used in data science, machine learning, and deep learning projects pre-installed.

For more information on ML-in-a-Box, refer to the ML-in-a-Box GitHub repository.

Set Up ML-in-a-Box

To use the ML-in-a-Box template, you need to choose this template when first creating your machine. In the Create a new machine page, under the Machine section, under the OS Template sub-section, click the dropdown menu, in the search bar, type “ML-in-a-Box”, then select it.

Note

For machines created after 17 January 2024, the ML-in-a-Box template automatically includes a NCCL configuration file to improve performance on hardware like NVIDIA H100s. On older machines, you can manually add this configuration by creating a /etc/nccl.conf file with the following contents:

NCCL_TOPO_FILE=/etc/nccl/topo.xml
NCCL_IB_DISABLE=0
NCCL_IB_CUDA_SUPPORT=1
NCCL_IB_HCA=mlx5
NCCL_CROSS_NIC=0
NCCL_SOCKET_IFNAME=eth0
NCCL_IB_GID_INDEX=1
The OS Template sub-section of the Machines section of the machine creation page with ML-in-the-Box searched.

Afterwards, continue configuring your machine as needed, then click CREATE MACHINE.

Connect to Your Machine

Machines created with ML-in-a-Box only have terminal access, so you need to connect to your machine using SSH. After connecting to your machine, your home directory is set to /home/paperspace and the shell is set to /bin/bash.

To report any issues with the software, provide feedback, or issue requests, see the Paperspace Community or contact Paperspace support.

Verifying Your GPUs

To verify your GPUs on your machine, run the NVIDIA System Management Interface to display information about the device:

nvidia-smi

This outputs a list of the NVIDIA GPUs on your machine:

+---------------------------------------------------------------------------------------+
| NVIDIA-SMI 535.129.03             Driver Version: 535.129.03   CUDA Version: 12.2     |
|-----------------------------------------+----------------------+----------------------+
| GPU  Name                 Persistence-M | Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp   Perf          Pwr:Usage/Cap |         Memory-Usage | GPU-Util  Compute M. |
|                                         |                      |               MIG M. |
|=========================================+======================+======================|
|   0  NVIDIA H100 80GB HBM3          On  | 00000000:00:05.0 Off |                    0 |
| N/A   25C    P0              74W / 700W |    155MiB / 81559MiB |      0%      Default |
|                                         |                      |             Disabled |
+-----------------------------------------+----------------------+----------------------+
...

Verifying PyTorch

If you need to verify your PyTorch environment on your machine, start by displaying your machine’s GPU specifications by running python on your terminal. Your PyTorch environment is set up properly if torch.cuda.is_available() returns True.

Python 3.11.7 (main, Dec  8 2023, 18:56:58) [GCC 11.4.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import torch
>>> torch.cuda.is_available()
True
>>> torch.cuda.get_device_name(0)
'NVIDIA H100 80GB HBM3'

If needed, run python -m torch.utils.collect_env for more information on your machine’s environment.

If the PyTorch environment on your machine is not set up properly, you may receive an error indicating that torch is not found:

Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
ModuleNotFoundError: No module named 'torch'

If you receive this error, install PyTorch.

Verifying TensorFlow

If you need to verify your TensorFlow environment on your machine, start by displaying your machine’s GPU specifications by running python on your terminal.

Your TensorFlow environment is set up properly if tf.config.list_physical_devices('GPU') isn’t an empty list and tf.test.is_built_with_cuda() returns True.

python
>>> import tensorflow as tf
>>> x = tf.config.list_physical_devices('GPU')
>>> for i in range(len(x)): print(x[i])
PhysicalDevice(name='/physical_device:GPU:0', device_type='GPU')
PhysicalDevice(name='/physical_device:GPU:1', device_type='GPU')
...
>>> tf.test.is_built_with_cuda()
True