Embeddings

Validated on 20 Apr 2026 • Last edited on 14 May 2026

Text embedding vectors via POST /v1/embeddings on the [Serverless Inference](/reference/api/reference/serverless-inference/ base URL https://inference.do-ai.run (bearer model access key).

Base URL https://inference.do-ai.run

POST Create embedding

/v1/embeddings
Authorizations: inference_bearer_auth
Http: Bearer

OAuth Authentication

In order to interact with the DigitalOcean API, you or your application must authenticate.

The DigitalOcean API handles this through OAuth, an open standard for authorization. OAuth allows you to delegate access to your account. Scopes can be used to grant full access, read-only access, or access to a specific set of endpoints.

You can generate an OAuth token by visiting the Apps & API section of the DigitalOcean control panel for your account.

An OAuth token functions as a complete authentication request. In effect, it acts as a substitute for a username and password pair.

Because of this, it is absolutely essential that you keep your OAuth tokens secure. In fact, upon generation, the web interface will only display each token a single time in order to prevent the token from being compromised.

DigitalOcean access tokens begin with an identifiable prefix in order to distinguish them from other similar tokens.

  • dop_v1_ for personal access tokens generated in the control panel
  • doo_v1_ for tokens generated by applications using the OAuth flow
  • dor_v1_ for OAuth refresh tokens

Authenticate with a Bearer Authorization Header

Serverless Inference:

curl -X POST -H "Authorization: Bearer $DIGITALOCEAN_TOKEN" "https://inference.do-ai.run/v1/chat/completions"

Agent Inference:

curl -X POST -H "Authorization: Bearer $AGENT_ACCESS_KEY" "https://{your-agent-url}.agents.do-ai.run/v1/chat/completions?agent=true"

Note: Agent Inference APIs use an agent_access_key (endpoint access key) instead of a DigitalOcean OAuth token. The agent_access_key is provided when you provision an agent endpoint and is scoped to that specific agent. It is not interchangeable with DigitalOcean OAuth tokens (dop_v1_*, doo_v1_*, dor_v1_*), which are used with Serverless Inference and the control-plane API at https://api.digitalocean.com.

Create vector embeddings for one or more text inputs. OpenAI-compatible request and response. Unknown fields in the request body are rejected. There is no streaming response for this endpoint.

Request Body: application/json

encoding_format string, one of: float, base64 optional
Example: float

How embedding values are returned in each data[].embedding field.

input array | string required
Example: hello world

A single string or 1–2048 strings; each string produces one row in data, in order.

model string required
Example: qwen3-embedding-0.6b

Model id to use for embeddings. Must match a model your account can access.

user string optional
Example: user-1234

Optional end-user identifier to help with abuse monitoring.

Content type application/json
{
  "encoding_format": "float",
  "input": "hello world",
  "model": "qwen3-embedding-0.6b",
  "user": "user-1234"
}
curl -X POST \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer $DIGITALOCEAN_TOKEN" \
  -d '{"model":"qwen3-embedding-0.6b","input":["hello world","goodbye world"],"encoding_format":"float","user":"user-1234"}' \
  "https://inference.do-ai.run/v1/embeddings"
import os
from pydo import Client

client = Client(token=os.environ.get("DIGITALOCEAN_TOKEN"))

resp = client.embeddings.create(
    model="qwen3-embedding-0.6b",
    input=["hello world", "goodbye world"],
    encoding_format="float",
    user="user-1234",
)

for item in resp.data:
    print(item.index, item.embedding[:8])
import { InferenceClient } from "@digitalocean/dots";

const client = new InferenceClient({
    apiKey: process.env.DIGITALOCEAN_TOKEN,
});

const resp = await client.embeddings.create({
    model: "qwen3-embedding-0.6b",
    input: ["hello world", "goodbye world"],
    encoding_format: "float",
    user: "user-1234",
});

for (const item of resp.data) {
    console.log(item.index, item.embedding.slice(0, 8));
}

Responses

200

Embeddings and usage for the given input or inputs, in order.

ratelimit-limit integer

The default limit on number of requests that can be made per hour and per minute. Current rate limits are 5000 requests per hour and 250 requests per minute.

ratelimit-remaining integer

The number of requests in your hourly quota that remain before you hit your request limit. See https://docs.digitalocean.com/reference/api/reference/#rate-limit for information about how requests expire.

ratelimit-reset integer

The time when the oldest request will expire. The value is given in Unix epoch time. See https://docs.digitalocean.com/reference/api/reference/#rate-limit for information about how requests expire.

data array of object required

One entry for each input string, in the same order.

Show child properties
embedding array | string required
Example: [0.0123,-0.0456,0.0001]

The embedding vector, or a base64-encoded string when the request set encoding_format to base64.

index integer required
Example: 0

Zero-based index of the corresponding input item (0 when input is a string).

object string, one of: embedding required
Example: embedding

The object type, which is always embedding.

model string required
Example: qwen3-embedding-0.6b

The embedding model that produced the vectors.

object string, one of: list required
Example: list

The object type, which is always the string list.

usage object required

Token usage for the embeddings request.

Show child properties
prompt_tokens integer required
Example: 6

Number of input tokens used for the embedding.

total_tokens integer required
Example: 6

Total billable tokens for the request.

401

Authentication failed due to invalid credentials.

ratelimit-limit integer

The default limit on number of requests that can be made per hour and per minute. Current rate limits are 5000 requests per hour and 250 requests per minute.

ratelimit-remaining integer

The number of requests in your hourly quota that remain before you hit your request limit. See https://docs.digitalocean.com/reference/api/reference/#rate-limit for information about how requests expire.

ratelimit-reset integer

The time when the oldest request will expire. The value is given in Unix epoch time. See https://docs.digitalocean.com/reference/api/reference/#rate-limit for information about how requests expire.

id string required
Example: not_found

A short identifier corresponding to the HTTP status code returned. For example, the ID for a response returning a 404 status code would be "not_found."

message string required
Example: The resource you were accessing could not be found.

A message providing additional information about the error, including details to help resolve it when possible.

request_id string optional
Example: 4d9d8375-3c56-4925-a3e7-eb137fed17e9

Optionally, some endpoints may include a request ID that should be provided when reporting bugs or opening support tickets to help identify the issue.

429

The API rate limit has been exceeded.

ratelimit-limit integer

The default limit on number of requests that can be made per hour and per minute. Current rate limits are 5000 requests per hour and 250 requests per minute.

ratelimit-remaining integer

The number of requests in your hourly quota that remain before you hit your request limit. See https://docs.digitalocean.com/reference/api/reference/#rate-limit for information about how requests expire.

ratelimit-reset integer

The time when the oldest request will expire. The value is given in Unix epoch time. See https://docs.digitalocean.com/reference/api/reference/#rate-limit for information about how requests expire.

id string required
Example: not_found

A short identifier corresponding to the HTTP status code returned. For example, the ID for a response returning a 404 status code would be "not_found."

message string required
Example: The resource you were accessing could not be found.

A message providing additional information about the error, including details to help resolve it when possible.

request_id string optional
Example: 4d9d8375-3c56-4925-a3e7-eb137fed17e9

Optionally, some endpoints may include a request ID that should be provided when reporting bugs or opening support tickets to help identify the issue.

500

There was a server error.

ratelimit-limit integer

The default limit on number of requests that can be made per hour and per minute. Current rate limits are 5000 requests per hour and 250 requests per minute.

ratelimit-remaining integer

The number of requests in your hourly quota that remain before you hit your request limit. See https://docs.digitalocean.com/reference/api/reference/#rate-limit for information about how requests expire.

ratelimit-reset integer

The time when the oldest request will expire. The value is given in Unix epoch time. See https://docs.digitalocean.com/reference/api/reference/#rate-limit for information about how requests expire.

id string required
Example: not_found

A short identifier corresponding to the HTTP status code returned. For example, the ID for a response returning a 404 status code would be "not_found."

message string required
Example: The resource you were accessing could not be found.

A message providing additional information about the error, including details to help resolve it when possible.

request_id string optional
Example: 4d9d8375-3c56-4925-a3e7-eb137fed17e9

Optionally, some endpoints may include a request ID that should be provided when reporting bugs or opening support tickets to help identify the issue.

default

There was an unexpected error.

ratelimit-limit integer

The default limit on number of requests that can be made per hour and per minute. Current rate limits are 5000 requests per hour and 250 requests per minute.

ratelimit-remaining integer

The number of requests in your hourly quota that remain before you hit your request limit. See https://docs.digitalocean.com/reference/api/reference/#rate-limit for information about how requests expire.

ratelimit-reset integer

The time when the oldest request will expire. The value is given in Unix epoch time. See https://docs.digitalocean.com/reference/api/reference/#rate-limit for information about how requests expire.

id string required
Example: not_found

A short identifier corresponding to the HTTP status code returned. For example, the ID for a response returning a 404 status code would be "not_found."

message string required
Example: The resource you were accessing could not be found.

A message providing additional information about the error, including details to help resolve it when possible.

request_id string optional
Example: 4d9d8375-3c56-4925-a3e7-eb137fed17e9

Optionally, some endpoints may include a request ID that should be provided when reporting bugs or opening support tickets to help identify the issue.

{
  "data": [
    {
      "embedding": [
        0.0123,
        -0.0456,
        0.0001
      ],
      "index": 0,
      "object": "embedding"
    }
  ],
  "model": "qwen3-embedding-0.6b",
  "object": "list",
  "usage": {
    "prompt_tokens": 6,
    "total_tokens": 6
  }
}
{
  "id": "unauthorized",
  "message": "Unable to authenticate you."
}
{
  "id": "too_many_requests",
  "message": "API rate limit exceeded."
}
{
  "id": "server_error",
  "message": "Unexpected server-side error"
}
{
  "id": "example_error",
  "message": "some error message"
}

We can't find any results for your search.

Try using different keywords or simplifying your search terms.