Machines are Linux and Windows virtual machines with persistent storage, GPU options, and free unlimited bandwidth. They’re designed for high-performance computing (HPC) workloads.
ML-in-a-Box is a Linux-based machine template with a pre-installed data science stack, including popular tools like PyTorch, TensorFlow, Hugging Face Transformers, Deepspeed, JupyterLab, NumPy, Pandas, XGBoost, and Scikit-learn. It comes with full machine learning support (CUDA, cuDNN, NVIDIA Docker). You can use this template to run machine learning software, develop and train models, or install additional tools as needed.
ML-in-a-Box includes several directory-specific commands located in /usr/bin
, /usr/local/bin
, /home/paperspace/.local/bin
, and /usr/local/cuda/bin
, allowing you to access and run machine learning software or install additional tools. These directories are already included in your system’s PATH
, so no manual configuration is needed.
ML-in-a-Box also has Python libraries commonly used in data science, machine learning, and deep learning projects pre-installed.
For more information on ML-in-a-Box, refer to the ML-in-a-Box GitHub repository.
To use the ML-in-a-Box template, you need to choose this template when first creating your machine. In the Create a new machine page, under the Machine section, under the OS Template sub-section, click the dropdown menu, in the search bar, type “ML-in-a-Box”, then select it.
For machines created after 17 January 2024, the ML-in-a-Box template automatically includes a NCCL configuration file to improve performance on hardware like NVIDIA H100s.
On older machines, you can manually add this configuration by creating a /etc/nccl.conf
file with the following contents:
NCCL_TOPO_FILE=/etc/nccl/topo.xml
NCCL_IB_DISABLE=0
NCCL_IB_CUDA_SUPPORT=1
NCCL_IB_HCA=mlx5
NCCL_CROSS_NIC=0
NCCL_SOCKET_IFNAME=eth0
NCCL_IB_GID_INDEX=1
Afterwards, continue configuring your machine as needed, then click CREATE MACHINE.
Machines created with ML-in-a-Box only have terminal access, so you need to connect to your machine using SSH. After connecting to your machine, your home directory is set to /home/paperspace
and the shell is set to /bin/bash
.
To report any issues with the software, provide feedback, or issue requests, see the Paperspace Community or contact Paperspace support.
To verify your GPUs on your machine, run the NVIDIA System Management Interface to display information about the device:
nvidia-smi
This outputs a list of the NVIDIA GPUs on your machine:
+---------------------------------------------------------------------------------------+
| NVIDIA-SMI 535.129.03 Driver Version: 535.129.03 CUDA Version: 12.2 |
|-----------------------------------------+----------------------+----------------------+
| GPU Name Persistence-M | Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap | Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|=========================================+======================+======================|
| 0 NVIDIA H100 80GB HBM3 On | 00000000:00:05.0 Off | 0 |
| N/A 25C P0 74W / 700W | 155MiB / 81559MiB | 0% Default |
| | | Disabled |
+-----------------------------------------+----------------------+----------------------+
...
If you need to verify your PyTorch environment on your machine, start by displaying your machine’s GPU specifications by running python
on your terminal. Your PyTorch environment is set up properly if torch.cuda.is_available()
returns True
.
Python 3.11.7 (main, Dec 8 2023, 18:56:58) [GCC 11.4.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import torch
>>> torch.cuda.is_available()
True
>>> torch.cuda.get_device_name(0)
'NVIDIA H100 80GB HBM3'
If needed, run python -m torch.utils.collect_env
for more information on your machine’s environment.
If the PyTorch environment on your machine is not set up properly, you may receive an error indicating that torch
is not found:
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
ModuleNotFoundError: No module named 'torch'
If you receive this error, install PyTorch.
If you need to verify your TensorFlow environment on your machine, start by displaying your machine’s GPU specifications by running python
on your terminal.
Your TensorFlow environment is set up properly if tf.config.list_physical_devices('GPU')
isn’t an empty list and tf.test.is_built_with_cuda()
returns True
.
python
>>> import tensorflow as tf
>>> x = tf.config.list_physical_devices('GPU')
>>> for i in range(len(x)): print(x[i])
PhysicalDevice(name='/physical_device:GPU:0', device_type='GPU')
PhysicalDevice(name='/physical_device:GPU:1', device_type='GPU')
...
>>> tf.test.is_built_with_cuda()
True