How to Use Serverless Inference
Validated on 13 May 2025 • Last edited on 8 Jul 2025
GradientAI Platform lets you build fully-managed AI agents with knowledge bases for retrieval-augmented generation, multi-agent routing, guardrails, and more, or use serverless inference to make direct requests to popular foundation models.
Serverless inference lets you send API requests directly to foundation models without creating or managing an AI agent. This allows you to generate responses without adding any initial instructions or configuration to the model.
Specify the desired model directly in your API requests. DigitalOcean hosts several models and also provides access to third-party models like OpenAI and Anthropic.
All requests are billed per input and output token.
You can create, rename, regenerate, or delete keys at any time.
Create a Model Access Key
Model access keys authenticate requests to serverless inference model endpoints. Your model access key gives you access to all available models.
To create a model access key in the DigitalOcean Control Panel, in the left menu, click Agent Platform, then click the Model access keys tab.
Under the Model Access Keys section, click Create model access key to open the Add model access key window. In the Key name field, choose a name for your model access key, then click Save.
Your new model access key with its creation date appears under the Model Access Keys section with your secret key temporarily showing. You can only view the new key once after creation, so copy and store it. We recommend using a secrets manager (for example, AWS Secrets Manager, HashiCorp Vault, or 1Password) or a secure environment variable in your deployment configuration.
Send a Request to a Model
You can access all available models using a single fixed endpoint, https://inference.do-ai.run/v1
. For example, you can use https://inference.do-ai.run/v1/chat/completions
to send chat-based prompts and receive model-generated responses in a conversational format.
Sending a request to a model for inference requires a model access key, a slug indicating which model to send the request to, and the content of your request.
Choose a model to send your requests to. To view the available model slugs programmatically, send a GET request to the https://inference.do-ai.run/v1/models
endpoint using your model access key. For example, with curl
:
curl -X GET https://inference.do-ai.run/v1/models \
-H "Authorization: Bearer $MODEL_ACCESS_KEY" \
-H "Content-Type: application/json"
This returns a list of available models with their corresponding id
values, which you use as the model
parameter in your inference requests. For example, to use the Llama 3.3 Instruct-70B model, set model
to llama3.3-70b-instruct
in the request body.
After choosing a model and getting its slug, send an HTTP request using the following schema:
curl https://inference.do-ai.run/v1/chat/completions \
-H "Authorization: Bearer $MODEL_ACCESS_KEY" \
-H "Content-Type: application/json" \
-d '{
"model": "llama3.3-70b-instruct",
"messages": [{"role": "user", "content": "What is the capital of France?"}],
"temperature": 0.7,
"max_tokens": 50
}'
This example curl
request sends a POST request to the llama3-8b
model to generate a response. The request contains the following request body fields:
model
: Specifies the model to use for inference. Use this field to easily switch between models by changing the model slug in your request.messages
: Defines the conversation history. Serverless inference is session-less, so include all necessary context directly in this field. For a basic prompt, provide a singleuser
message with your question or instruction.temperature
: Sets the randomness of the response. Lower values produce more focused and deterministic outputs, while higher values allow for more creative or varied results. Takes a decimal number between0.0
and1.0
.max_tokens
: Sets the maximum number of tokens to generate in the response. Adjust this value to control the response length and manage token costs. You can view the maximum number of tokens for each model on the models page.
The inference service then returns a JSON response like this:
{
"choices": [
{
"finish_reason": "stop",
"index": 0,
"logprobs": null,
"message": {
"audio": null,
"content": "The capital of France is Paris.",
"refusal": null,
"role": ""
}
}
],
"created": 1747247763,
"id": "",
"model": "llama3.3-70b-instruct",
"object": "chat.completion",
"service_tier": null,
"usage": {
"completion_tokens": 8,
"prompt_tokens": 43,
"total_tokens": 51
}
}
Rename a Model Access Key
Renaming a model access key helps you organize and manage your keys more effectively, especially when using multiple keys for different projects or environments.
To rename a model access key in the DigitalOcean Control Panel, on the left menu, click GenAI Platform, then click the Model access keys tab.
Under the Model Access Keys section, find the model access key you want to rename, to the right of the key, click …, then click Rename to open the Rename model access key window.
In the Key name field, type the new name for your key, then click UPDATE.
Regenerate a Model Access Key
Regenerating a model access key creates a new secret key and immediately and permanently invalidates the old one. Use this process if you believe a key has been compromised or want to rotate keys for security purposes. After regenerating, you need to update any applications using the old key to use the new key for it to retain access to the model’s endpoint.
To regenerate a model access key in the DigitalOcean Control Panel, in the left menu, click Agent Platform, then click the Model access keys tab.
In the Model Access Keys section, find the model access key you want to regenerate. To the right of the key, click …, then click Regenerate to open the Regenerate model access key window.
To confirm regeneration, in the Regenerate model access key window, type the name of your access key, then click Regenerate access key.
Your new secret key is displayed in the Model Access Keys section.
Delete a Model Access Key
Deleting a model access key permanently and irreversibly destroys it. Any external applications using a destroyed key lose access to the model’s endpoint.
To delete a model access key in the DigitalOcean Control Panel, in the left menu, click Agent Platform, then click the Model access keys tab.
In the Model Access Keys section, find the model access key you want to delete. To the right of the key, click …, then click Delete to open the Delete model access key window.
To confirm deletion, type the name of your access key, then click Delete access key.