Embeddings
Validated on 20 Apr 2026 • Last edited on 14 May 2026
Text embedding vectors via POST /v1/embeddings on the
[Serverless Inference](/reference/api/reference/serverless-inference/ base URL
https://inference.do-ai.run (bearer model access key).
https://inference.do-ai.run
Endpoints
POST Create embedding
/v1/embeddings
Authorizations:
inference_bearer_auth
OAuth Authentication
In order to interact with the DigitalOcean API, you or your application must authenticate.
The DigitalOcean API handles this through OAuth, an open standard for authorization. OAuth allows you to delegate access to your account. Scopes can be used to grant full access, read-only access, or access to a specific set of endpoints.
You can generate an OAuth token by visiting the Apps & API section of the DigitalOcean control panel for your account.
An OAuth token functions as a complete authentication request. In effect, it acts as a substitute for a username and password pair.
Because of this, it is absolutely essential that you keep your OAuth tokens secure. In fact, upon generation, the web interface will only display each token a single time in order to prevent the token from being compromised.
DigitalOcean access tokens begin with an identifiable prefix in order to distinguish them from other similar tokens.
dop_v1_for personal access tokens generated in the control paneldoo_v1_for tokens generated by applications using the OAuth flowdor_v1_for OAuth refresh tokens
Authenticate with a Bearer Authorization Header
Serverless Inference:
curl -X POST -H "Authorization: Bearer $DIGITALOCEAN_TOKEN" "https://inference.do-ai.run/v1/chat/completions"
Agent Inference:
curl -X POST -H "Authorization: Bearer $AGENT_ACCESS_KEY" "https://{your-agent-url}.agents.do-ai.run/v1/chat/completions?agent=true"
Note: Agent Inference APIs use an agent_access_key (endpoint access
key) instead of a DigitalOcean OAuth token. The agent_access_key is
provided when you provision an agent endpoint and is scoped to that
specific agent. It is not interchangeable with DigitalOcean OAuth tokens
(dop_v1_*, doo_v1_*, dor_v1_*), which are used with Serverless
Inference and the control-plane API at https://api.digitalocean.com.
Create vector embeddings for one or more text inputs. OpenAI-compatible request and response. Unknown fields in the request body are rejected. There is no streaming response for this endpoint.
Request Body: application/json
encoding_format
optional
floatHow embedding values are returned in each data[].embedding field.
input
required
hello worldA single string or 1–2048 strings; each string produces one row in data, in order.
model
required
qwen3-embedding-0.6bModel id to use for embeddings. Must match a model your account can access.
user
optional
user-1234Optional end-user identifier to help with abuse monitoring.
Request: /v1/embeddings
{
"encoding_format": "float",
"input": "hello world",
"model": "qwen3-embedding-0.6b",
"user": "user-1234"
}curl -X POST \
-H "Content-Type: application/json" \
-H "Authorization: Bearer $DIGITALOCEAN_TOKEN" \
-d '{"model":"qwen3-embedding-0.6b","input":["hello world","goodbye world"],"encoding_format":"float","user":"user-1234"}' \
"https://inference.do-ai.run/v1/embeddings"import os
from pydo import Client
client = Client(token=os.environ.get("DIGITALOCEAN_TOKEN"))
resp = client.embeddings.create(
model="qwen3-embedding-0.6b",
input=["hello world", "goodbye world"],
encoding_format="float",
user="user-1234",
)
for item in resp.data:
print(item.index, item.embedding[:8])import { InferenceClient } from "@digitalocean/dots";
const client = new InferenceClient({
apiKey: process.env.DIGITALOCEAN_TOKEN,
});
const resp = await client.embeddings.create({
model: "qwen3-embedding-0.6b",
input: ["hello world", "goodbye world"],
encoding_format: "float",
user: "user-1234",
});
for (const item of resp.data) {
console.log(item.index, item.embedding.slice(0, 8));
}Responses
200
Embeddings and usage for the given input or inputs, in order.
input or inputs, in order.ratelimit-limit
The default limit on number of requests that can be made per hour and per minute. Current rate limits are 5000 requests per hour and 250 requests per minute.
ratelimit-remaining
The number of requests in your hourly quota that remain before you hit your request limit. See https://docs.digitalocean.com/reference/api/reference/#rate-limit for information about how requests expire.
ratelimit-reset
The time when the oldest request will expire. The value is given in Unix epoch time. See https://docs.digitalocean.com/reference/api/reference/#rate-limit for information about how requests expire.
application/json
data
required
One entry for each input string, in the same order.
Show child properties
embedding
required
[0.0123,-0.0456,0.0001]The embedding vector, or a base64-encoded string when the request set encoding_format to base64.
index
required
0Zero-based index of the corresponding input item (0 when input is a string).
object
required
embeddingThe object type, which is always embedding.
model
required
qwen3-embedding-0.6bThe embedding model that produced the vectors.
object
required
listThe object type, which is always the string list.
usage
required
Token usage for the embeddings request.
Show child properties
prompt_tokens
required
6Number of input tokens used for the embedding.
total_tokens
required
6Total billable tokens for the request.
401
Authentication failed due to invalid credentials.
ratelimit-limit
The default limit on number of requests that can be made per hour and per minute. Current rate limits are 5000 requests per hour and 250 requests per minute.
ratelimit-remaining
The number of requests in your hourly quota that remain before you hit your request limit. See https://docs.digitalocean.com/reference/api/reference/#rate-limit for information about how requests expire.
ratelimit-reset
The time when the oldest request will expire. The value is given in Unix epoch time. See https://docs.digitalocean.com/reference/api/reference/#rate-limit for information about how requests expire.
application/json
id
required
not_foundA short identifier corresponding to the HTTP status code returned. For example, the ID for a response returning a 404 status code would be "not_found."
message
required
The resource you were accessing could not be found.A message providing additional information about the error, including details to help resolve it when possible.
request_id
optional
4d9d8375-3c56-4925-a3e7-eb137fed17e9Optionally, some endpoints may include a request ID that should be provided when reporting bugs or opening support tickets to help identify the issue.
429
The API rate limit has been exceeded.
ratelimit-limit
The default limit on number of requests that can be made per hour and per minute. Current rate limits are 5000 requests per hour and 250 requests per minute.
ratelimit-remaining
The number of requests in your hourly quota that remain before you hit your request limit. See https://docs.digitalocean.com/reference/api/reference/#rate-limit for information about how requests expire.
ratelimit-reset
The time when the oldest request will expire. The value is given in Unix epoch time. See https://docs.digitalocean.com/reference/api/reference/#rate-limit for information about how requests expire.
application/json
id
required
not_foundA short identifier corresponding to the HTTP status code returned. For example, the ID for a response returning a 404 status code would be "not_found."
message
required
The resource you were accessing could not be found.A message providing additional information about the error, including details to help resolve it when possible.
request_id
optional
4d9d8375-3c56-4925-a3e7-eb137fed17e9Optionally, some endpoints may include a request ID that should be provided when reporting bugs or opening support tickets to help identify the issue.
500
There was a server error.
ratelimit-limit
The default limit on number of requests that can be made per hour and per minute. Current rate limits are 5000 requests per hour and 250 requests per minute.
ratelimit-remaining
The number of requests in your hourly quota that remain before you hit your request limit. See https://docs.digitalocean.com/reference/api/reference/#rate-limit for information about how requests expire.
ratelimit-reset
The time when the oldest request will expire. The value is given in Unix epoch time. See https://docs.digitalocean.com/reference/api/reference/#rate-limit for information about how requests expire.
application/json
id
required
not_foundA short identifier corresponding to the HTTP status code returned. For example, the ID for a response returning a 404 status code would be "not_found."
message
required
The resource you were accessing could not be found.A message providing additional information about the error, including details to help resolve it when possible.
request_id
optional
4d9d8375-3c56-4925-a3e7-eb137fed17e9Optionally, some endpoints may include a request ID that should be provided when reporting bugs or opening support tickets to help identify the issue.
default
There was an unexpected error.
ratelimit-limit
The default limit on number of requests that can be made per hour and per minute. Current rate limits are 5000 requests per hour and 250 requests per minute.
ratelimit-remaining
The number of requests in your hourly quota that remain before you hit your request limit. See https://docs.digitalocean.com/reference/api/reference/#rate-limit for information about how requests expire.
ratelimit-reset
The time when the oldest request will expire. The value is given in Unix epoch time. See https://docs.digitalocean.com/reference/api/reference/#rate-limit for information about how requests expire.
application/json
id
required
not_foundA short identifier corresponding to the HTTP status code returned. For example, the ID for a response returning a 404 status code would be "not_found."
message
required
The resource you were accessing could not be found.A message providing additional information about the error, including details to help resolve it when possible.
request_id
optional
4d9d8375-3c56-4925-a3e7-eb137fed17e9Optionally, some endpoints may include a request ID that should be provided when reporting bugs or opening support tickets to help identify the issue.
Response
{
"data": [
{
"embedding": [
0.0123,
-0.0456,
0.0001
],
"index": 0,
"object": "embedding"
}
],
"model": "qwen3-embedding-0.6b",
"object": "list",
"usage": {
"prompt_tokens": 6,
"total_tokens": 6
}
}{
"id": "unauthorized",
"message": "Unable to authenticate you."
}{
"id": "too_many_requests",
"message": "API rate limit exceeded."
}{
"id": "server_error",
"message": "Unexpected server-side error"
}{
"id": "example_error",
"message": "some error message"
}