How to Use Serverless Inferencepublic

Validated on 13 May 2025 • Last edited on 28 May 2025

The DigitalOcean GenAI Platform lets you work with popular foundation models and build GPU-powered AI agents with fully-managed deployment, or send direct requests using serverless inference. Create agents that incorporate guardrails, functions, agent routing, and retrieval-augmented generation (RAG) pipelines with knowledge bases.

Serverless inference lets you send API requests directly to foundation models without creating or managing an AI agent. This allows you to generate responses without adding any initial instructions or configuration to the model. Specify the desired model directly in your API requests.

DigitalOcean hosts several models and also provides access to third-party models like OpenAI and Anthropic.

All requests are billed per input and output token.

You can create, rename, regenerate, or delete keys at any time.

Create a Model Access Key

Model access keys authenticate requests to serverless inference model endpoints. These keys are private and incur usage-based charges. Do not share them or expose them in frontend code.

Your model access key gives you access to all available models.

To create a model access key in the DigitalOcean Control Panel, on the left menu, click GenAI Platform, then click the Model access keys tab.

Under the Model Access Keys section, click Create model access key to open the Add model access key window.

In the Key name field, choose a name for your model access key, then click Save.

Your new model access key with its creation date appears under the Model Access Keys section with your secret key temporarily showing. Copy your key and store it securely, as you can only view the new key once after creation. You can store it in a secrets manager (for example, AWS Secrets Manager, HashiCorp Vault, or 1Password) or as an environment variable in your deployment configuration.

Send a Request to a Model

You can access all available models using a single fixed endpoint:

https://inference.do-ai.run/v1

For example, https://inference.do-ai.run/v1/chat/completions is used to send chat-based prompts and receive model-generated responses in a conversational format.

Sending a request to a model for inference requires a model access key, a slug indicating which model to send the request to, and the content of your request. If you don’t have a model access key, create one.

Choose a model to send your requests to. To view the available model slugs programmatically, send a GET request to the https://inference.do-ai.run/v1/models endpoint using your model access key like this:

curl -X GET https://inference.do-ai.run/v1/models \
  -H "Authorization: Bearer $MODEL_ACCESS_KEY" \
  -H "Content-Type: application/json"

This returns a list of available models with their corresponding id values, which you use as the model parameter in your inference requests. For example, to use the Llama 3.3 Instruct-70B model, set model to llama3.3-70b-instruct in your request body.

After choosing a model and getting its slug, send an HTTP request using the following schema:

curl https://inference.do-ai.run/v1/chat/completions \
  -H "Authorization: Bearer $MODEL_ACCESS_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "llama3.3-70b-instruct",
    "messages": [{"role": "user", "content": "What is the capital of France?"}], 
    "temperature": 0.7,
    "max_tokens": 50
  }'

The example curl request above sends a POST request to the llama3-8b model to generate a response. The request contains the following request body fields:

  • model: Specifies the model to use for inference. Use this field to easily switch between models by changing the model slug in your request.
  • messages: Defines the conversation history. Serverless inference is session-less, so include all necessary context directly in this field. For a basic prompt, provide a single user message with your question or instruction.
  • temperature: Sets the randomness of the response. Lower values produce more focused and deterministic outputs, while higher values allow for more creative or varied results. Takes a decimal number between 0.0 and 1.0.
  • max_tokens: Sets the maximum number of tokens to generate in the response. Adjust this value to control the response length and manage token costs. You can view the maximum number of tokens for each model on the models page.

The inference service then returns a JSON response like this:

{
  "choices": [
    {
      "finish_reason": "stop",
      "index": 0,
      "logprobs": null,
      "message": {
        "audio": null,
        "content": "The capital of France is Paris.",
        "refusal": null,
        "role": ""
      }
    }
  ],
  "created": 1747247763,
  "id": "",
  "model": "llama3.3-70b-instruct",
  "object": "chat.completion",
  "service_tier": null,
  "usage": {
    "completion_tokens": 8,
    "prompt_tokens": 43,
    "total_tokens": 51
  }
}

Rename a Model Access Key

Renaming a model access key helps you organize and manage your keys more effectively, especially when using multiple keys for different projects or environments.

To rename a model access key in the DigitalOcean Control Panel, on the left menu, click GenAI Platform, then click the Model access keys tab.

Under the Model Access Keys section, find the model access key you want to rename, to the right of the key, click , then click Rename to open the Rename model access key window.

In the Key name field, type the new name for your key, then click UPDATE.

Regenerate a Model Access Key

Regenerating a model access key creates a new secret key and immediately invalidates the old one. Use this process if you believe a key has been compromised or want to rotate keys for security purposes. Regeneration permanently invalidates the prior key and you cannot revert the regeneration once it’s done. We recommend updating all affected applications with the new key to avoid service interruptions.

To regenerate a model access key in the DigitalOcean Control Panel, on the left menu, click GenAI Platform, then click the Model access keys tab.

Under the Model Access Keys section, find the model access key you want to regenerate, to the right of the key, click , then click Regenerate to open the Regenerate model access key window.

To confirm regeneration, under the Regenerate model access key window, type the name of your access key, then click Regenerate access key.

Under the Model Access Keys section is where your key is shown once. Ensure to copy and store it securely. You can securely store it in a secrets manager (for example, AWS Secrets Manager, HashiCorp Vault, or 1Password) or as an environment variable in your deployment configuration.

Delete a Model Access Key

Deleting a model access key permanently and irreversibly destroys it. Any external applications using a destroyed key lose access to the model’s endpoint.

To delete a model access key in the DigitalOcean Control Panel, on the left menu, click GenAI Platform, then click the Model access keys tab.

Under the Model Access Keys section, find the model access key you want to delete, to the right of the key, click , then click Delete to open the Delete model access key window.

To confirm deletion, type the name of your access key, then click Delete access key.

We can't find any results for your search.

Try using different keywords or simplifying your search terms.