Give Feedback

How to Use Serverless Inference

Validated on 13 May 2025 • Last edited on 8 Jul 2025

GradientAI Platform lets you build fully-managed AI agents with knowledge bases for retrieval-augmented generation, multi-agent routing, guardrails, and more, or use serverless inference to make direct requests to popular foundation models.

Serverless inference lets you send API requests directly to foundation models without creating or managing an AI agent. This allows you to generate responses without adding any initial instructions or configuration to the model.

Specify the desired model directly in your API requests. DigitalOcean hosts several models and also provides access to third-party models like OpenAI and Anthropic.

All requests are billed per input and output token.

You can create, rename, regenerate, or delete keys at any time.

Create a Model Access Key

Model access keys authenticate requests to serverless inference model endpoints. Your model access key gives you access to all available models.

Warning

Model access keys are private and incur usage-based charges. Do not share them or expose them in front-end code.

To create a model access key in the DigitalOcean Control Panel, in the left menu, click Agent Platform, then click the Model access keys tab.

Under the Model Access Keys section, click Create model access key to open the Add model access key window. In the Key name field, choose a name for your model access key, then click Save.

Your new model access key with its creation date appears under the Model Access Keys section with your secret key temporarily showing. You can only view the new key once after creation, so copy and store it. We recommend using a secrets manager (for example, AWS Secrets Manager, HashiCorp Vault, or 1Password) or a secure environment variable in your deployment configuration.

Send a Request to a Model

You can access all available models using a single fixed endpoint, https://inference.do-ai.run/v1. For example, you can use https://inference.do-ai.run/v1/chat/completions to send chat-based prompts and receive model-generated responses in a conversational format.

Sending a request to a model for inference requires a model access key, a slug indicating which model to send the request to, and the content of your request.

Choose a model to send your requests to. To view the available model slugs programmatically, send a GET request to the https://inference.do-ai.run/v1/models endpoint using your model access key. For example, with curl:

curl -X GET https://inference.do-ai.run/v1/models \
  -H "Authorization: Bearer $MODEL_ACCESS_KEY" \
  -H "Content-Type: application/json"

This returns a list of available models with their corresponding id values, which you use as the model parameter in your inference requests. For example, to use the Llama 3.3 Instruct-70B model, set model to llama3.3-70b-instruct in the request body.

After choosing a model and getting its slug, send an HTTP request using the following schema:

curl https://inference.do-ai.run/v1/chat/completions \
  -H "Authorization: Bearer $MODEL_ACCESS_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "llama3.3-70b-instruct",
    "messages": [{"role": "user", "content": "What is the capital of France?"}], 
    "temperature": 0.7,
    "max_tokens": 50
  }'

This example curl request sends a POST request to the llama3-8b model to generate a response. The request contains the following request body fields:

model: Specifies the model to use for inference. Use this field to easily switch between models by changing the model slug in your request.
messages: Defines the conversation history. Serverless inference is session-less, so include all necessary context directly in this field. For a basic prompt, provide a single user message with your question or instruction.
temperature: Sets the randomness of the response. Lower values produce more focused and deterministic outputs, while higher values allow for more creative or varied results. Takes a decimal number between 0.0 and 1.0.
max_tokens: Sets the maximum number of tokens to generate in the response. Adjust this value to control the response length and manage token costs. You can view the maximum number of tokens for each model on the models page.

The inference service then returns a JSON response like this:

{
  "choices": [
    {
      "finish_reason": "stop",
      "index": 0,
      "logprobs": null,
      "message": {
        "audio": null,
        "content": "The capital of France is Paris.",
        "refusal": null,
        "role": ""
      }
    }
  ],
  "created": 1747247763,
  "id": "",
  "model": "llama3.3-70b-instruct",
  "object": "chat.completion",
  "service_tier": null,
  "usage": {
    "completion_tokens": 8,
    "prompt_tokens": 43,
    "total_tokens": 51
  }
}

Rename a Model Access Key

Renaming a model access key helps you organize and manage your keys more effectively, especially when using multiple keys for different projects or environments.

To rename a model access key in the DigitalOcean Control Panel, on the left menu, click GenAI Platform, then click the Model access keys tab.

Under the Model Access Keys section, find the model access key you want to rename, to the right of the key, click …, then click Rename to open the Rename model access key window.

In the Key name field, type the new name for your key, then click UPDATE.

Regenerate a Model Access Key

Regenerating a model access key creates a new secret key and immediately and permanently invalidates the old one. Use this process if you believe a key has been compromised or want to rotate keys for security purposes. After regenerating, you need to update any applications using the old key to use the new key for it to retain access to the model’s endpoint.

To regenerate a model access key in the DigitalOcean Control Panel, in the left menu, click Agent Platform, then click the Model access keys tab.

In the Model Access Keys section, find the model access key you want to regenerate. To the right of the key, click …, then click Regenerate to open the Regenerate model access key window.

To confirm regeneration, in the Regenerate model access key window, type the name of your access key, then click Regenerate access key.

Your new secret key is displayed in the Model Access Keys section.

Delete a Model Access Key

Deleting a model access key permanently and irreversibly destroys it. Any external applications using a destroyed key lose access to the model’s endpoint.

To delete a model access key in the DigitalOcean Control Panel, in the left menu, click Agent Platform, then click the Model access keys tab.

In the Model Access Keys section, find the model access key you want to delete. To the right of the key, click …, then click Delete to open the Delete model access key window.

To confirm deletion, type the name of your access key, then click Delete access key.