Embeddings
Validated on 20 Apr 2026 • Last edited on 27 Apr 2026
Text embedding vectors via POST /v1/embeddings on the
[Serverless Inference](/reference/api/reference/serverless-inference/ base URL
https://inference.do-ai.run (bearer model access key).
https://inference.do-ai.run
Endpoints
POST Create embedding
/v1/embeddings
Authorizations:
inference_bearer_auth
Inference API Authentication
The Inference APIs use API access keys for authentication, which are separate from the DigitalOcean OAuth tokens used by the control-plane API.
Include the key as a Bearer token in the Authorization header of each
request. All requests must be made over HTTPS.
Key Types
| API | Key Type | Key Pattern | How to Obtain |
|---|---|---|---|
| Serverless Inference | Model access key | sk-do-* (e.g., sk-do-v1-abcd1234...) |
Generate in the AI/ML section of the DigitalOcean control panel |
| Agent Inference | Endpoint access key | Alphanumeric string (e.g., Abc1Def2Ghi3Jkl4...) |
Provided when provisioning an agent endpoint |
Authenticate with a Bearer Authorization Header
Serverless Inference:
curl -X POST -H "Authorization: Bearer $MODEL_ACCESS_KEY" "https://inference.do-ai.run/v1/chat/completions"
Agent Inference:
curl -X POST -H "Authorization: Bearer $AGENT_ACCESS_KEY" "https://{your-agent-url}.agents.do-ai.run/v1/chat/completions?agent=true"
Note: These keys are not interchangeable with DigitalOcean OAuth
tokens (dop_v1_*, doo_v1_*, dor_v1_*). OAuth tokens are used
exclusively with the control-plane API at https://api.digitalocean.com.
Create vector embeddings for one or more text inputs. OpenAI-compatible request and response. Unknown fields in the request body are rejected. There is no streaming response for this endpoint.
Request Body: application/json
encoding_format
optional
floatHow embedding values are returned in each data[].embedding field.
input
required
hello worldA single string or 1–2048 strings; each string produces one row in data, in order.
model
required
qwen3-embedding-0.6bModel id to use for embeddings. Must match a model your account can access.
user
optional
user-1234Optional end-user identifier to help with abuse monitoring.
Request: /v1/embeddings
{
"encoding_format": "float",
"input": "hello world",
"model": "qwen3-embedding-0.6b",
"user": "user-1234"
}curl -X POST \
-H "Content-Type: application/json" \
-H "Authorization: Bearer $MODEL_ACCESS_KEY" \
-d '{"model":"qwen3-embedding-0.6b","input":["hello world","goodbye world"],"encoding_format":"float","user":"user-1234"}' \
"https://inference.do-ai.run/v1/embeddings"import os
from pydo import Client
client = Client(token=os.environ.get("MODEL_ACCESS_KEY"))
resp = client.embeddings.create(
model="qwen3-embedding-0.6b",
input=["hello world", "goodbye world"],
encoding_format="float",
user="user-1234",
)
for item in resp.data:
print(item.index, item.embedding[:8])import { InferenceClient } from "@digitalocean/dots";
const client = new InferenceClient({
apiKey: process.env.MODEL_ACCESS_KEY,
});
const resp = await client.embeddings.create({
model: "qwen3-embedding-0.6b",
input: ["hello world", "goodbye world"],
encoding_format: "float",
user: "user-1234",
});
for (const item of resp.data) {
console.log(item.index, item.embedding.slice(0, 8));
}Responses
200
Embeddings and usage for the given input or inputs, in order.
input or inputs, in order.ratelimit-limit
The default limit on number of requests that can be made per hour and per minute. Current rate limits are 5000 requests per hour and 250 requests per minute.
ratelimit-remaining
The number of requests in your hourly quota that remain before you hit your request limit. See https://docs.digitalocean.com/reference/api/reference/#rate-limit for information about how requests expire.
ratelimit-reset
The time when the oldest request will expire. The value is given in Unix epoch time. See https://docs.digitalocean.com/reference/api/reference/#rate-limit for information about how requests expire.
application/json
data
required
One entry for each input string, in the same order.
Show child properties
embedding
required
[0.0123,-0.0456,0.0001]The embedding vector, or a base64-encoded string when the request set encoding_format to base64.
index
required
0Zero-based index of the corresponding input item (0 when input is a string).
object
required
embeddingThe object type, which is always embedding.
model
required
qwen3-embedding-0.6bThe embedding model that produced the vectors.
object
required
listThe object type, which is always the string list.
usage
required
Token usage for the embeddings request.
Show child properties
prompt_tokens
required
6Number of input tokens used for the embedding.
total_tokens
required
6Total billable tokens for the request.
401
Authentication failed due to invalid credentials.
ratelimit-limit
The default limit on number of requests that can be made per hour and per minute. Current rate limits are 5000 requests per hour and 250 requests per minute.
ratelimit-remaining
The number of requests in your hourly quota that remain before you hit your request limit. See https://docs.digitalocean.com/reference/api/reference/#rate-limit for information about how requests expire.
ratelimit-reset
The time when the oldest request will expire. The value is given in Unix epoch time. See https://docs.digitalocean.com/reference/api/reference/#rate-limit for information about how requests expire.
application/json
id
required
not_foundA short identifier corresponding to the HTTP status code returned. For example, the ID for a response returning a 404 status code would be "not_found."
message
required
The resource you were accessing could not be found.A message providing additional information about the error, including details to help resolve it when possible.
request_id
optional
4d9d8375-3c56-4925-a3e7-eb137fed17e9Optionally, some endpoints may include a request ID that should be provided when reporting bugs or opening support tickets to help identify the issue.
429
The API rate limit has been exceeded.
ratelimit-limit
The default limit on number of requests that can be made per hour and per minute. Current rate limits are 5000 requests per hour and 250 requests per minute.
ratelimit-remaining
The number of requests in your hourly quota that remain before you hit your request limit. See https://docs.digitalocean.com/reference/api/reference/#rate-limit for information about how requests expire.
ratelimit-reset
The time when the oldest request will expire. The value is given in Unix epoch time. See https://docs.digitalocean.com/reference/api/reference/#rate-limit for information about how requests expire.
application/json
id
required
not_foundA short identifier corresponding to the HTTP status code returned. For example, the ID for a response returning a 404 status code would be "not_found."
message
required
The resource you were accessing could not be found.A message providing additional information about the error, including details to help resolve it when possible.
request_id
optional
4d9d8375-3c56-4925-a3e7-eb137fed17e9Optionally, some endpoints may include a request ID that should be provided when reporting bugs or opening support tickets to help identify the issue.
500
There was a server error.
ratelimit-limit
The default limit on number of requests that can be made per hour and per minute. Current rate limits are 5000 requests per hour and 250 requests per minute.
ratelimit-remaining
The number of requests in your hourly quota that remain before you hit your request limit. See https://docs.digitalocean.com/reference/api/reference/#rate-limit for information about how requests expire.
ratelimit-reset
The time when the oldest request will expire. The value is given in Unix epoch time. See https://docs.digitalocean.com/reference/api/reference/#rate-limit for information about how requests expire.
application/json
id
required
not_foundA short identifier corresponding to the HTTP status code returned. For example, the ID for a response returning a 404 status code would be "not_found."
message
required
The resource you were accessing could not be found.A message providing additional information about the error, including details to help resolve it when possible.
request_id
optional
4d9d8375-3c56-4925-a3e7-eb137fed17e9Optionally, some endpoints may include a request ID that should be provided when reporting bugs or opening support tickets to help identify the issue.
default
There was an unexpected error.
ratelimit-limit
The default limit on number of requests that can be made per hour and per minute. Current rate limits are 5000 requests per hour and 250 requests per minute.
ratelimit-remaining
The number of requests in your hourly quota that remain before you hit your request limit. See https://docs.digitalocean.com/reference/api/reference/#rate-limit for information about how requests expire.
ratelimit-reset
The time when the oldest request will expire. The value is given in Unix epoch time. See https://docs.digitalocean.com/reference/api/reference/#rate-limit for information about how requests expire.
application/json
id
required
not_foundA short identifier corresponding to the HTTP status code returned. For example, the ID for a response returning a 404 status code would be "not_found."
message
required
The resource you were accessing could not be found.A message providing additional information about the error, including details to help resolve it when possible.
request_id
optional
4d9d8375-3c56-4925-a3e7-eb137fed17e9Optionally, some endpoints may include a request ID that should be provided when reporting bugs or opening support tickets to help identify the issue.
Response
{
"data": [
{
"embedding": [
0.0123,
-0.0456,
0.0001
],
"index": 0,
"object": "embedding"
}
],
"model": "qwen3-embedding-0.6b",
"object": "list",
"usage": {
"prompt_tokens": 6,
"total_tokens": 6
}
}{
"id": "unauthorized",
"message": "Unable to authenticate you."
}{
"id": "too_many_requests",
"message": "API rate limit exceeded."
}{
"id": "server_error",
"message": "Unexpected server-side error"
}{
"id": "example_error",
"message": "some error message"
}