How to Use Serverless Inferencepublic
Validated on 13 May 2025 • Last edited on 28 May 2025
The DigitalOcean GenAI Platform lets you work with popular foundation models and build GPU-powered AI agents with fully-managed deployment, or send direct requests using serverless inference. Create agents that incorporate guardrails, functions, agent routing, and retrieval-augmented generation (RAG) pipelines with knowledge bases.
Serverless inference lets you send API requests directly to foundation models without creating or managing an AI agent. This allows you to generate responses without adding any initial instructions or configuration to the model. Specify the desired model directly in your API requests.
DigitalOcean hosts several models and also provides access to third-party models like OpenAI and Anthropic.
All requests are billed per input and output token.
You can create, rename, regenerate, or delete keys at any time.
Create a Model Access Key
Model access keys authenticate requests to serverless inference model endpoints. These keys are private and incur usage-based charges. Do not share them or expose them in frontend code.
Your model access key gives you access to all available models.
To create a model access key in the DigitalOcean Control Panel, on the left menu, click GenAI Platform, then click the Model access keys tab.
Under the Model Access Keys section, click Create model access key to open the Add model access key window.
In the Key name field, choose a name for your model access key, then click Save.
Your new model access key with its creation date appears under the Model Access Keys section with your secret key temporarily showing. Copy your key and store it securely, as you can only view the new key once after creation. You can store it in a secrets manager (for example, AWS Secrets Manager, HashiCorp Vault, or 1Password) or as an environment variable in your deployment configuration.
Send a Request to a Model
You can access all available models using a single fixed endpoint:
https://inference.do-ai.run/v1
For example, https://inference.do-ai.run/v1/chat/completions
is used to send chat-based prompts and receive model-generated responses in a conversational format.
Sending a request to a model for inference requires a model access key, a slug indicating which model to send the request to, and the content of your request. If you don’t have a model access key, create one.
Choose a model to send your requests to. To view the available model slugs programmatically, send a GET request to the https://inference.do-ai.run/v1/models
endpoint using your model access key like this:
curl -X GET https://inference.do-ai.run/v1/models \
-H "Authorization: Bearer $MODEL_ACCESS_KEY" \
-H "Content-Type: application/json"
This returns a list of available models with their corresponding id
values, which you use as the model
parameter in your inference requests. For example, to use the Llama 3.3 Instruct-70B model, set model
to llama3.3-70b-instruct
in your request body.
After choosing a model and getting its slug, send an HTTP request using the following schema:
curl https://inference.do-ai.run/v1/chat/completions \
-H "Authorization: Bearer $MODEL_ACCESS_KEY" \
-H "Content-Type: application/json" \
-d '{
"model": "llama3.3-70b-instruct",
"messages": [{"role": "user", "content": "What is the capital of France?"}],
"temperature": 0.7,
"max_tokens": 50
}'
The example curl
request above sends a POST request to the llama3-8b
model to generate a response. The request contains the following request body fields:
model
: Specifies the model to use for inference. Use this field to easily switch between models by changing the model slug in your request.messages
: Defines the conversation history. Serverless inference is session-less, so include all necessary context directly in this field. For a basic prompt, provide a singleuser
message with your question or instruction.temperature
: Sets the randomness of the response. Lower values produce more focused and deterministic outputs, while higher values allow for more creative or varied results. Takes a decimal number between0.0
and1.0
.max_tokens
: Sets the maximum number of tokens to generate in the response. Adjust this value to control the response length and manage token costs. You can view the maximum number of tokens for each model on the models page.
The inference service then returns a JSON response like this:
{
"choices": [
{
"finish_reason": "stop",
"index": 0,
"logprobs": null,
"message": {
"audio": null,
"content": "The capital of France is Paris.",
"refusal": null,
"role": ""
}
}
],
"created": 1747247763,
"id": "",
"model": "llama3.3-70b-instruct",
"object": "chat.completion",
"service_tier": null,
"usage": {
"completion_tokens": 8,
"prompt_tokens": 43,
"total_tokens": 51
}
}
Rename a Model Access Key
Renaming a model access key helps you organize and manage your keys more effectively, especially when using multiple keys for different projects or environments.
To rename a model access key in the DigitalOcean Control Panel, on the left menu, click GenAI Platform, then click the Model access keys tab.
Under the Model Access Keys section, find the model access key you want to rename, to the right of the key, click …, then click Rename to open the Rename model access key window.
In the Key name field, type the new name for your key, then click UPDATE.
Regenerate a Model Access Key
Regenerating a model access key creates a new secret key and immediately invalidates the old one. Use this process if you believe a key has been compromised or want to rotate keys for security purposes. Regeneration permanently invalidates the prior key and you cannot revert the regeneration once it’s done. We recommend updating all affected applications with the new key to avoid service interruptions.
To regenerate a model access key in the DigitalOcean Control Panel, on the left menu, click GenAI Platform, then click the Model access keys tab.
Under the Model Access Keys section, find the model access key you want to regenerate, to the right of the key, click …, then click Regenerate to open the Regenerate model access key window.
To confirm regeneration, under the Regenerate model access key window, type the name of your access key, then click Regenerate access key.
Under the Model Access Keys section is where your key is shown once. Ensure to copy and store it securely. You can securely store it in a secrets manager (for example, AWS Secrets Manager, HashiCorp Vault, or 1Password) or as an environment variable in your deployment configuration.
Delete a Model Access Key
Deleting a model access key permanently and irreversibly destroys it. Any external applications using a destroyed key lose access to the model’s endpoint.
To delete a model access key in the DigitalOcean Control Panel, on the left menu, click GenAI Platform, then click the Model access keys tab.
Under the Model Access Keys section, find the model access key you want to delete, to the right of the key, click …, then click Delete to open the Delete model access key window.
To confirm deletion, type the name of your access key, then click Delete access key.