How to Transfer Data to Paperspace

Machines are high-performing computing for scaling AI applications.


This guide walks you through how to load data from various sources onto your machine.

Copy data to and from a Linux Machine

This guide covers how to copy data onto and off of your machine from:

  • A local desktop or laptop using scp
  • Publicly accessible URLs and buckets using wget
  • Private object storage for example S3 Buckets

Copy files to or from your desktop computer

Note
This article assumes you’re using either MacOSX or Linux. If you’re using Windows, you can install Windows Subsystem for Linux (WSL) and continue following along as normal.

In order to copy files from your local laptop or desktop to your machine, you need to have SSH setup correctly on your local machine.

Once you have SSH setup, it only takes a single command to copy files to your machine. However, you need to modify it first. This command should be run in your local terminal (while not connected via SSH to your machine).

scp -i ~/.ssh/my-key.pem ~/path/to/local_file paperspace@machine-ip-address:~/.

To use this command, replace my-key with the name of the SSH key you created to connect to your machine (don’t forget the .pem).

Next, replace ~/path/to/local_file with the local file on your machine. Remember that ~ is short hand for your home directory, so if you wanted to upload a data set in your Documents folder you would type something like: ~/Documents/my-data-set.csv.

After that, replace machine-ip-address with the IP address listed for your machine in the console.

Optionally, you can also replace the ~/. with whatever path you would like to copy the data to on your machine. If you leave it as is, it is copied into your home directory on the machine.

Copy results from your machine back to your local machine

Copying files back to your local machine is nearly the same as before. However, you have to flip the local and remote paths.

scp -i ~/.ssh/my-key.pem ubuntu@machine-ip-address:~/results.ckpt .

You still need to update the my-key.pem as well as the machine-ip-address with your own. This time you specify the remote file you would like to copy for example, ~/results.ckpt and then tell scp where to send it for example, . (. being the current directory).

Copy files from publicly accessible URLs and cloud storage buckets

To copy files from a public URL or cloud storage bucket we recommend using wget.

First make sure you’re connected via SSH to your machine and then run:

wget https://example.com/example-data-set.tar.gz

where https://example.com/example-data-set.tar.gz is the publicly accessible URL of the data set you’d like to download. This works for publicly accessible S3, Azure, or Google Cloud bucket urls.

See below for downloading files from private buckets.

Open compressed data sets

If the data set you’ve downloaded is compressed as a file with the .tar.gz extension you can decompress it using the tar command like so:

tar -xvf example-data-set.tar.gz

If your data set is compressed as a file with the .zip extension, use the unzip command.

sudo apt-get install unzip
unzip example-data-set.zip

Copy Files from private S3, Azure, or Google Cloud buckets

To copy files from a private cloud storage bucket we recommend installing the CLI of the specific cloud provider on your machine.

Example: S3 — using the AWS CLI

We recommend installing the AWS CLI via pip. To get the AWS CLI up and running for syncing files from S3, SSH into your machine and then:

  • Install the AWS CLI via pip by following the instructions for Linux.
  • Configure the CLI to use the correct credentials for accessing the desired bucket.
  • Follow the S3 CLI documentation on using the AWS S3 sync command to copy the desired files.

The most common errors we see with using the AWS CLI to download files from or upload files to S3 involve incorrect IAM roles and permissions. It’s important to make sure that the IAM user associated with the access key you use has the correct permissions to read from and write to the desired bucket.