Deploy a Model to an Endpoint using Gradient

Paperspace Deployments are containers-as-a-service that allow you to run container images and serve machine learning models using a high-performance, low-latency service with a RESTful API.


In this tutorial, you learn how to start deploying a model to an endpoint using Gradient.

Introduction

Gradient Deployments lets us perform effortless model serving to an API endpoint.

During this tutorial we learn how to use Gradient to deploy a machine learning model using a high-performance, low-latency microservice with a RESTful API.

Endpoint to reach a hosted Streamlit application

Gradient Deployments let you deploy a container image to an API endpoint for hosting or inference.

In Part 1 of this tutorial, we’re going to deploy a generic version of Streamlit to an endpoint. This is an extremely quick exercise to demonstrate how quickly we can serve a container to an endpoint in Gradient.

In Part 2, we’re going to increase complexity and serve an inference endpoint. We first train a model with Gradient Workflows and then deploy it.

Let’s get started!

Part 1: Deploy a Streamlit application

In Part 1 of this tutorial, our goal is to deploy an application as fast as possible. Once we get a handle on the look and feel of a Deployment, we then move on to Part 2 during which we train and deploy an inference server capable of responding to a query.

I this first part, we’re going to use the Streamlit demo application that is available as a starter Deployment within the Gradient console.

Here is what we’re going to do:

  • Create a new project in Gradient
  • Create a Streamlit Deployment within this project
  • Confirm that our new Deployment is online and functional

Let’s create our new project!

Create a project

If we haven’t already created a project in the Gradient console, we need to do that first. We’re going to select Create a Project from the homepage of our workspace. Alternatively, if we already have a project that we’d like to use, we can select that project as well. We can create as many projects as we like no matter what service tier we’re on – so feel free to create a new project to keep things nice and tidy.

If we haven’t made a Project yet in Gradient, create a new one now. Here we’ve named our new project Deployments Tutorial. We add more than one Deployment to this project as we progress through his tutorial.

Endpoint to reach a hosted Streamlit application

Great! We should now have a new project in Gradient with the name Deployments Tutorial.

After we create a new project, we’re going to create a new Deployment within that project.

Create a Deployment using a basic template in the console

Let’s go ahead and create a Deployment within our project. Once we’re within the project in the UI (we can confirm this by seeing the name of our project in the top left along with the button to navigate back to All Projects) let’s tab over to the Deployments panel and click Create.

Let’s name our new Deployment Streamlit Deployment and then click Deploy. This generates a deployment using the pre-made deployment spec and options provided by Paperspace.

Streamlit Deployment

Excellent! While our Deployment is spinning-up, let’s take a look at the Deployment spec that we ran:

Deployment spec that uses Streamlit

The spec for this Deployment looks like this:

image: lucone83/streamlit-nginx
port: 8080
env:
  - name: ENV
    value: VAR
resources:
  replicas: 1
  instanceType: C4

Let’s break down our instructions to Gradient:

  • image - the specified image or container for Gradient to pull from DockerHub
  • port - the communication endpoint that the Deployment server should expose to the open internet
  • env - environment variables to apply at runtime
  • resources - machines to apply to this job

As we can see, there is a good amount of configuration options available. If we wanted to change the instance type, for example, we would need to modify the resources block. If we wanted to upgrade our processor to a C5 instance, for example, we would write:

resources:
  replicas: 1
  instanceType: C5

That’s all there is to it! For more information on the Deployment spec, be sure to read the docs.

Meanwhile, let’s see the result of our Deployment:

Loading Streamlit on Gradient endpoint

Excellent! It takes about 30 seconds for our Deployment to come up, but once it does the UI state changes to Ready and the link to our Streamlit Deployment brings us to a hosted endpoint where we can demo a number of Streamlit features.

We’ve now successfully launched our first Deployment!

In the next part of the tutorial we take a look at doing a more end-to-end Gradient Deployments project where we train a model first and then host it to an endpoint.

Part 2: Deploy from Gradient Model Registry

In Part 2 of this tutorial, we are going to upload an ONNX model to Gradient and serve it to an endpoint as a Deployment. By the end of this section we can query the endpoint to run inference.

For Part 2 of this tutorial, we are going to reference the sample ONNX model GitHub repository which contains the sample ONNX model as well as a sample Deployment spec.

To deploy a model endpoint, we first need to upload the model to Gradient. After that, we reference the model in our Deployment spec which allows us to serve the model as an endpoint.

Download the model files from GitHub

The first thing we do is grab the model files from the demo repo and upload them as a model object in the Gradient user interface.

These are the model files we are looking to upload to Gradient:

/models/fashion-mnist/fashion-mnist.onnx
/models/fashion-mnist/fashion-mnist.py

The model files we need to download and then upload to Gradient are available in the models/fashion-mnist directory of the onnx-deployment repo, are also available via the GitHub UI.

Onnx GitHub repo

We first need to download the files to our local machine before we can upload to Gradient. We can accomplish this from our local machine via the command line or via the GitHub user interface.

In the command line we can first cd into our local directory of choice. Here we use /downloads:

cd ~/downloads

Then we can use the git clone command to clone the repo to our local machine:

git clone https://github.com/gradient-ai/onnx-deployment.git

We can then open the repo that we downloaded and navigate to onnx-deployment/models/fashion-mnist to find our files.

Alternatively, if we prefer to use the GitHub UI, we can use the download button and then unzip the repo to the preferred destination on our local machine. The download button is located under the Code drop-down in the GitHub UI, as seen below:

Download ZIP

Next, we’re going to upload the model files to Gradient as a model object.

Upload the model to Gradient

To upload a model to Gradient, we first hop over to the Models tab within the Gradient console.

From here, we select Upload a model which brings up a window that allows us to select or drag-and-drop files to upload as a model.

Upload our files to Gradient Deployments

These are the steps we take in the Upload a model window:

  • Drag and drop QTY 2 model files (fashion-mnist.onnx and fashion-mnist.py) from our local machine
  • Toggle the drop-down menu Select model type and select ONNX
  • Give our model a name – in this case we use fashion-mnist
  • Select Gradient Managed as our Storage provider

By selecting Gradient Managed, we are telling Gradient that we’d like to use the native storage and file system on Gradient rather than setting up our own storage system.

All we have to do now is click Upload and wait for the model to be available, which should take a few moments.

Locate Model name and Model ID

At this point we should see that our model has both a name (in our case fashion-mnist) and a model ID (in our case moajlgixhg68re). Your own model name depends on the name you give your model, and the ID depends on the computer-generated ID that Gradient assigns you.

In any case, we need to reference these two pieces of information in the next step, so let’s be sure to hold on to them!

Create another Deployment

Now we’re going to create another Deployment within the same project we used in Part 1 (which we titled Deployments Tutorial).

New Deployment within the same project

From the Deployments tab of the Gradient console, we first select Create.

Next, we use the Upload a Deployment spec text link to bring up the YAML spec editor.

By default, the spec looks like this:

Default YAML spec

Instead of the default spec, we’re going to overwrite the spec with our own spec as follows:

image: cwetherill/paperspace-onnx-sample
port: 8501
resources:
  replicas: 1
  instanceType: C4
models:
  - id: moajlgixhg68re
    path: /opt/models/fashion-mnist

Here is an explanation of the Deployment spec:

  • image - the Docker image or container that we want Gradient to pull
  • port - the port number we would like to open to the public web to make requests
  • resources - the machine we are specifying to run our Deployment – in this case a single instance (also known as a replica) of a C4 machine (see available machines here)
  • models - the id and path of the model we would like to deploy

After a few minutes we should see that our instance is now running:

Deployment is running when the Status reads Ready

Nice! We now have a Deployment that is up and running.

Confirm the Deployment

To verify that this runtime is up and running, we can use a nifty helper endpoint that the Paperspace team included with the runtime, which is available at <endpoint>/healthcheck.

The <endpoint> parameter is available in the Details section of the Deployment. Here is a close-up of the details pane:

URL of endpoint in the Details pane

Here’s what it looks like when we try the endpoint. We are going to use our endpoint and append /healthcheck to the URL since we know that this container has defined such a method.

Every Deployment generates a unique endpoint URL – for this reason we purposely do not provide the URL of the endpoint from this tutorial and instead ask the user to create a new Deployment to obtain a unique Deployment endpoint.

visit <endpoint>/healthcheck to read the status of the model serving computer

Our model server is running – excellent!

We can also cURL this endpoint from the command line on our local machine to achieve the same result. Here is what that command looks like:

curl https://d4f2a6b0206084221af768e5cfedb5dc4.clg07azjl.paperspacegradient.com/healthcheck

Let’s also try another method which has been made available to us in this demo repository by the Paperspace team. This method tells us about our metadata and we can find it at <endpoint>/v1/models/fashion-mnist/metadata.

We should see that this URL also produces a response – in this case the method allows us to inspect the request/response formats provided by the endpoint.

inspect the request/response formats

If we’re using cURL in the terminal our command line entry would look like this:

curl https://d4f2a6b0206084221af768e5cfedb5dc4.clg07azjl.paperspacegradient.com/v1/models/fashion-mnist/metadata

Great! We’ve now successfully deployed the model that we uploaded previously and we’ve confirmed with two separate helper methods that the model server is running and that the endpoint request/response formatting is as follows:

{
   "signature_name":"serving_default",
   "inputs":{
      "image_data":{
         "dtype":"float"
      }
   },
   "outputs":{
      "class_probabilities":{
         "dtype":"float"
      }
   }
}

Next we’re going to query the endpoint in a notebook.

Use Gradient Notebooks to query the endpoint!

Let’s go ahead and hit the endpoint now that we’ve successfully deployed our model. We do so with a Gradient Notebook which allows us to view the image we are submitting to the endpoint via matplotlib.

First, we create a new notebook using the TensorFlow 2 template.

To create the notebook, return to the Gradient console. Ensuring we are in the same project (Deployments Tutorial in our case) as our Deployment, we then go to the Notebooks tab in the Gradient console and select Create.

Create a notebook within the same team

Let’s select the TensorFlow 2 runtime (currently Tensorflow 2.6.0) to use when querying our endpoint.

Next, let’s select a machine. Since we are running inference on the endpoint (and not in the notebook itself) we can run the lightest CPU-based machine available. Let’s select the Free CPU instance if it’s available. If the free instance is not available due to capacity, we can select the most basic paid CPU instance available.

Create a notebook using the TensorFlow 2 runtime

We now take advantage of Advanced options when we create our notebook to tell Gradient to pull the notebook file that is located in the GitHub repo.

The link to the file which we use as the Workspace URL in Advanced options is as follows:

https://github.com/gradient-ai/onnx-deployment/blob/main/prediction-client.ipynb

After that we click Start Notebook and wait for our notebook to spin-up!

Making requests from the notebook

When our notebook spins up, we should see something like this:

Single file called prediction-client.ipynb

Let’s go ahead and run the first cell which installs matplotlib.

Next, we need to input our Deployment endpoint as a replacement for the {sys.argv[1]} variable. If we need to get our Deployment URL once again, we can visit the Gradient console, then Deployments, then click the Deployment and copy the Endpoint parameter under Details.

When we have our Deployment endpoint, we drop it in as a replacement for the {sys.argv[1]} variable.

replace the URL of the endpoint

Now we can run the notebook!

After we paste our unique Deployment endpoint into the notebook prediction request parameter, we can run the cell. We should then be able to generate a prediction.

Generate a prediction

If everything works correctly, we should be able to generate a prediction. In our case, we correctly identified the image as an Ankle boot!

Image of an ankle boot

Success! Nice job identifying that ankle boot! We ran inference from an endpoint and returned the prediction to a notebook – awesome!

What we’ve accomplished

In Part 2 of this tutorial, we’ve accomplished the following:

  • We downloaded an ONNX model from the example repo and uploaded it to Gradient as a new model
  • We created a new Deployment which serves the uploaded model
  • We confirmed the model endpoint is functional by hitting a few endpoints in the browser as well as via cURL requests in the terminal
  • We then used Gradient Notebooks to pull in the .ipynb file from the demo repository and make requests to the endpoint
  • We successfully predicted that an image of a shoe is indeed an ankle boot!

What’s next?

Once a model is deployed to an endpoint with Gradient Deployments the possibilities are truly unlimited.

In Part 2 of this tutorial we used a notebook to query the endpoint so that we could visualize the image submitted to the endpoint with matplotlib. We could hit the endpoint from the command line of our local machine or from within an application we are writing.

We can also use Gradient Workflows to make a request to the endpoint, or we could use third party tools like Postman to make requests.

In short, there are a lot of possibilities. Whatever we decide to do next, the important thing is to go step by step and make sure each piece of the Deployment is working as you progress.

If you have any questions or need a hand getting something to work – please let us know! You can file a support ticket.

Reference reading

Additional tutorials