Give Feedback

How to Use Serverless Inference on DigitalOcean Gradient™ AI Platform

Validated on 12 Aug 2025 • Last edited on 15 Oct 2025

DigitalOcean Gradient™ AI Platform lets you build fully-managed AI agents with knowledge bases for retrieval-augmented generation, multi-agent routing, guardrails, and more, or use serverless inference to make direct requests to popular foundation models.

Serverless inference lets you send API requests directly to foundation models without creating or managing an AI agent. This generates responses without any initial instructions or configuration to the model.

To use serverless inference, send a conventional HTTP request to any of the serverless inference API endpoints. You need to authenticate with a model access key, which you can create and manage in the DigitalOcean Control Panel. All requests are billed per input and output token.

Serverless Inference API Endpoints

The serverless inference API is available at https://inference.do-ai.run and has the following endpoints:

Endpoint	Verb	Description
`/v1/models`	GET	Returns a list of available models and their IDs.
`/v1/chat/completions`	POST	Sends chat-style prompts and returns model responses.

/v1/models

To retrieve available models, send a GET request to /models using your model access key. For example, using curl:

curl -X GET https://inference.do-ai.run/v1/models \
  -H "Authorization: Bearer $MODEL_ACCESS_KEY" \
  -H "Content-Type: application/json"

This returns a list of available models with their corresponding model IDs (id).

/v1/chat/completions

To send a prompt to a model, send a POST request to /v1/chat/completions. The request body must include:

model: The model ID of the model you want to use. Get the model ID using /v1/models or on the available models page.
messages: The input prompt or conversation history. Serverless inference does not have sessions, so include all relevant context using this field.
temperature: A value between 0.0 and 1.0 to control randomness and creativity.
max_tokens: The maximum number of tokens to generate in the response. Use this to manage output length and cost.

The following example request sends a prompt to the specified model (Llama 3.3 Instruct-70B) with the prompt “What is the capital of France?”, a temperature of 0.7, and max tokens set to 50.

curl https://inference.do-ai.run/v1/chat/completions \
  -H "Authorization: Bearer $MODEL_ACCESS_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "llama3.3-70b-instruct",
    "messages": [{"role": "user", "content": "What is the capital of France?"}], 
    "temperature": 0.7,
    "max_tokens": 50
  }'

The response includes the generated text and token usage details:

{
  "choices": [
    {
      "finish_reason": "stop",
      "index": 0,
      "logprobs": null,
      "message": {
        "audio": null,
        "content": "The capital of France is Paris.",
        "refusal": null,
        "role": ""
      }
    }
  ],
  "created": 1747247763,
  "id": "",
  "model": "llama3.3-70b-instruct",
  "object": "chat.completion",
  "service_tier": null,
  "usage": {
    "completion_tokens": 8,
    "prompt_tokens": 43,
    "total_tokens": 51
  }
}

Alternatively, you can call serverless inference from your automation workflows. An n8n community node is available that connects to any DigitalOcean-hosted model using your model access key. You can self-host n8n using the n8n Marketplace app.

Model Access Keys

You can create and manage model access keys on the control panel’s Serverless inference page in the Model Access Keys section.

Create Keys

To create a model access key, click Create model access key to open the Add model access key window. In the Key name field, choose a name for your model access key, then click Save.

Your new model access key with its creation date appears in the Model Access Keys section. The secret key is visible only once, immediately after creation, so copy and store it securely.

Model access keys are private and incur usage-based charges. Do not share them or expose them in front-end code. We recommend storing them using a secrets manager (for example, AWS Secrets Manager, HashiCorp Vault, or 1Password) or a secure environment variable in your deployment configuration.

Rename Keys

Renaming a model access key can help you organize and manage your keys more effectively, especially when using multiple keys for different projects or environments.

To rename a key, click … to the right of the key in the list to open the key’s menu, then click Rename. In the Rename model access key window that opens, in the Key name field, enter a new name for your key and then click UPDATE.

Regenerate Keys

Regenerating a model access key creates a new secret key and immediately and permanently invalidates the previous one. If a key has been compromised or want to rotate keys for security purposes, regenerate the key, then update any applications using the previous key to use the new key.

To regenerate a key, click … to the right of the key in the list to open the key’s menu, then click Regenerate. In the Regenerate model access key window that opens, enter the name of your key to confirm the action, then click Regenerate access key. Your new secret key is displayed in the Model Access Keys section.

Delete Keys

Deleting a model access key permanently and irreversibly destroys it. Any applications using a destroyed key lose access to the API.

To delete a key, click … to the right of the key in the list to open the key’s menu, then click Delete. In the Delete model access key window that opens, type the name of the key to confirm the deletion, then click Delete access key.