Embeddings

Validated on 20 Apr 2026 • Last edited on 27 Apr 2026

Text embedding vectors via POST /v1/embeddings on the [Serverless Inference](/reference/api/reference/serverless-inference/ base URL https://inference.do-ai.run (bearer model access key).

Base URL https://inference.do-ai.run

POST Create embedding

/v1/embeddings
Authorizations: inference_bearer_auth
Http: Bearer

Inference API Authentication

The Inference APIs use API access keys for authentication, which are separate from the DigitalOcean OAuth tokens used by the control-plane API.

Include the key as a Bearer token in the Authorization header of each request. All requests must be made over HTTPS.

Key Types

API Key Type Key Pattern How to Obtain
Serverless Inference Model access key sk-do-* (e.g., sk-do-v1-abcd1234...) Generate in the AI/ML section of the DigitalOcean control panel
Agent Inference Endpoint access key Alphanumeric string (e.g., Abc1Def2Ghi3Jkl4...) Provided when provisioning an agent endpoint

Authenticate with a Bearer Authorization Header

Serverless Inference:

curl -X POST -H "Authorization: Bearer $MODEL_ACCESS_KEY" "https://inference.do-ai.run/v1/chat/completions"

Agent Inference:

curl -X POST -H "Authorization: Bearer $AGENT_ACCESS_KEY" "https://{your-agent-url}.agents.do-ai.run/v1/chat/completions?agent=true"

Note: These keys are not interchangeable with DigitalOcean OAuth tokens (dop_v1_*, doo_v1_*, dor_v1_*). OAuth tokens are used exclusively with the control-plane API at https://api.digitalocean.com.

Create vector embeddings for one or more text inputs. OpenAI-compatible request and response. Unknown fields in the request body are rejected. There is no streaming response for this endpoint.

Request Body: application/json

encoding_format string, one of: float, base64 optional
Example: float

How embedding values are returned in each data[].embedding field.

input array | string required
Example: hello world

A single string or 1–2048 strings; each string produces one row in data, in order.

model string required
Example: qwen3-embedding-0.6b

Model id to use for embeddings. Must match a model your account can access.

user string optional
Example: user-1234

Optional end-user identifier to help with abuse monitoring.

Content type application/json
{
  "encoding_format": "float",
  "input": "hello world",
  "model": "qwen3-embedding-0.6b",
  "user": "user-1234"
}
curl -X POST \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer $MODEL_ACCESS_KEY" \
  -d '{"model":"qwen3-embedding-0.6b","input":["hello world","goodbye world"],"encoding_format":"float","user":"user-1234"}' \
  "https://inference.do-ai.run/v1/embeddings"
import os
from pydo import Client

client = Client(token=os.environ.get("MODEL_ACCESS_KEY"))

resp = client.embeddings.create(
    model="qwen3-embedding-0.6b",
    input=["hello world", "goodbye world"],
    encoding_format="float",
    user="user-1234",
)

for item in resp.data:
    print(item.index, item.embedding[:8])
import { InferenceClient } from "@digitalocean/dots";

const client = new InferenceClient({
    apiKey: process.env.MODEL_ACCESS_KEY,
});

const resp = await client.embeddings.create({
    model: "qwen3-embedding-0.6b",
    input: ["hello world", "goodbye world"],
    encoding_format: "float",
    user: "user-1234",
});

for (const item of resp.data) {
    console.log(item.index, item.embedding.slice(0, 8));
}

Responses

200

Embeddings and usage for the given input or inputs, in order.

ratelimit-limit integer

The default limit on number of requests that can be made per hour and per minute. Current rate limits are 5000 requests per hour and 250 requests per minute.

ratelimit-remaining integer

The number of requests in your hourly quota that remain before you hit your request limit. See https://docs.digitalocean.com/reference/api/reference/#rate-limit for information about how requests expire.

ratelimit-reset integer

The time when the oldest request will expire. The value is given in Unix epoch time. See https://docs.digitalocean.com/reference/api/reference/#rate-limit for information about how requests expire.

data array of object required

One entry for each input string, in the same order.

Show child properties
embedding array | string required
Example: [0.0123,-0.0456,0.0001]

The embedding vector, or a base64-encoded string when the request set encoding_format to base64.

index integer required
Example: 0

Zero-based index of the corresponding input item (0 when input is a string).

object string, one of: embedding required
Example: embedding

The object type, which is always embedding.

model string required
Example: qwen3-embedding-0.6b

The embedding model that produced the vectors.

object string, one of: list required
Example: list

The object type, which is always the string list.

usage object required

Token usage for the embeddings request.

Show child properties
prompt_tokens integer required
Example: 6

Number of input tokens used for the embedding.

total_tokens integer required
Example: 6

Total billable tokens for the request.

401

Authentication failed due to invalid credentials.

ratelimit-limit integer

The default limit on number of requests that can be made per hour and per minute. Current rate limits are 5000 requests per hour and 250 requests per minute.

ratelimit-remaining integer

The number of requests in your hourly quota that remain before you hit your request limit. See https://docs.digitalocean.com/reference/api/reference/#rate-limit for information about how requests expire.

ratelimit-reset integer

The time when the oldest request will expire. The value is given in Unix epoch time. See https://docs.digitalocean.com/reference/api/reference/#rate-limit for information about how requests expire.

id string required
Example: not_found

A short identifier corresponding to the HTTP status code returned. For example, the ID for a response returning a 404 status code would be "not_found."

message string required
Example: The resource you were accessing could not be found.

A message providing additional information about the error, including details to help resolve it when possible.

request_id string optional
Example: 4d9d8375-3c56-4925-a3e7-eb137fed17e9

Optionally, some endpoints may include a request ID that should be provided when reporting bugs or opening support tickets to help identify the issue.

429

The API rate limit has been exceeded.

ratelimit-limit integer

The default limit on number of requests that can be made per hour and per minute. Current rate limits are 5000 requests per hour and 250 requests per minute.

ratelimit-remaining integer

The number of requests in your hourly quota that remain before you hit your request limit. See https://docs.digitalocean.com/reference/api/reference/#rate-limit for information about how requests expire.

ratelimit-reset integer

The time when the oldest request will expire. The value is given in Unix epoch time. See https://docs.digitalocean.com/reference/api/reference/#rate-limit for information about how requests expire.

id string required
Example: not_found

A short identifier corresponding to the HTTP status code returned. For example, the ID for a response returning a 404 status code would be "not_found."

message string required
Example: The resource you were accessing could not be found.

A message providing additional information about the error, including details to help resolve it when possible.

request_id string optional
Example: 4d9d8375-3c56-4925-a3e7-eb137fed17e9

Optionally, some endpoints may include a request ID that should be provided when reporting bugs or opening support tickets to help identify the issue.

500

There was a server error.

ratelimit-limit integer

The default limit on number of requests that can be made per hour and per minute. Current rate limits are 5000 requests per hour and 250 requests per minute.

ratelimit-remaining integer

The number of requests in your hourly quota that remain before you hit your request limit. See https://docs.digitalocean.com/reference/api/reference/#rate-limit for information about how requests expire.

ratelimit-reset integer

The time when the oldest request will expire. The value is given in Unix epoch time. See https://docs.digitalocean.com/reference/api/reference/#rate-limit for information about how requests expire.

id string required
Example: not_found

A short identifier corresponding to the HTTP status code returned. For example, the ID for a response returning a 404 status code would be "not_found."

message string required
Example: The resource you were accessing could not be found.

A message providing additional information about the error, including details to help resolve it when possible.

request_id string optional
Example: 4d9d8375-3c56-4925-a3e7-eb137fed17e9

Optionally, some endpoints may include a request ID that should be provided when reporting bugs or opening support tickets to help identify the issue.

default

There was an unexpected error.

ratelimit-limit integer

The default limit on number of requests that can be made per hour and per minute. Current rate limits are 5000 requests per hour and 250 requests per minute.

ratelimit-remaining integer

The number of requests in your hourly quota that remain before you hit your request limit. See https://docs.digitalocean.com/reference/api/reference/#rate-limit for information about how requests expire.

ratelimit-reset integer

The time when the oldest request will expire. The value is given in Unix epoch time. See https://docs.digitalocean.com/reference/api/reference/#rate-limit for information about how requests expire.

id string required
Example: not_found

A short identifier corresponding to the HTTP status code returned. For example, the ID for a response returning a 404 status code would be "not_found."

message string required
Example: The resource you were accessing could not be found.

A message providing additional information about the error, including details to help resolve it when possible.

request_id string optional
Example: 4d9d8375-3c56-4925-a3e7-eb137fed17e9

Optionally, some endpoints may include a request ID that should be provided when reporting bugs or opening support tickets to help identify the issue.

{
  "data": [
    {
      "embedding": [
        0.0123,
        -0.0456,
        0.0001
      ],
      "index": 0,
      "object": "embedding"
    }
  ],
  "model": "qwen3-embedding-0.6b",
  "object": "list",
  "usage": {
    "prompt_tokens": 6,
    "total_tokens": 6
  }
}
{
  "id": "unauthorized",
  "message": "Unable to authenticate you."
}
{
  "id": "too_many_requests",
  "message": "API rate limit exceeded."
}
{
  "id": "server_error",
  "message": "Unexpected server-side error"
}
{
  "id": "example_error",
  "message": "some error message"
}

We can't find any results for your search.

Try using different keywords or simplifying your search terms.