Give Feedback

Embeddings

Validated on 20 Apr 2026 • Last edited on 14 May 2026

Copy page as Markdown View page as Markdown

Text embedding vectors via POST /v1/embeddings on the [Serverless Inference](/reference/api/reference/serverless-inference/ base URL https://inference.do-ai.run (bearer model access key).

Base URL https://inference.do-ai.run

Endpoints

POST Create embedding

/v1/embeddings

Authorizations: inference_bearer_auth

Http: Bearer

OAuth Authentication

In order to interact with the DigitalOcean API, you or your application must authenticate.

The DigitalOcean API handles this through OAuth, an open standard for authorization. OAuth allows you to delegate access to your account. Scopes can be used to grant full access, read-only access, or access to a specific set of endpoints.

You can generate an OAuth token by visiting the Apps & API section of the DigitalOcean control panel for your account.

An OAuth token functions as a complete authentication request. In effect, it acts as a substitute for a username and password pair.

Because of this, it is absolutely essential that you keep your OAuth tokens secure. In fact, upon generation, the web interface will only display each token a single time in order to prevent the token from being compromised.

DigitalOcean access tokens begin with an identifiable prefix in order to distinguish them from other similar tokens.

dop_v1_ for personal access tokens generated in the control panel
doo_v1_ for tokens generated by applications using the OAuth flow
dor_v1_ for OAuth refresh tokens

Authenticate with a Bearer Authorization Header

Serverless Inference:

curl -X POST -H "Authorization: Bearer $DIGITALOCEAN_TOKEN" "https://inference.do-ai.run/v1/chat/completions"

Agent Inference:

curl -X POST -H "Authorization: Bearer $AGENT_ACCESS_KEY" "https://{your-agent-url}.agents.do-ai.run/v1/chat/completions?agent=true"

Note: Agent Inference APIs use an agent_access_key (endpoint access key) instead of a DigitalOcean OAuth token. The agent_access_key is provided when you provision an agent endpoint and is scoped to that specific agent. It is not interchangeable with DigitalOcean OAuth tokens (dop_v1_*, doo_v1_*, dor_v1_*), which are used with Serverless Inference and the control-plane API at https://api.digitalocean.com.

Create vector embeddings for one or more text inputs. OpenAI-compatible request and response. Unknown fields in the request body are rejected. There is no streaming response for this endpoint.

Request Body: `application/json`

encoding_format string, one of: float, base64 optional

Example: float

How embedding values are returned in each data[].embedding field.

input array | string required

Example: hello world

A single string or 1–2048 strings; each string produces one row in data, in order.

model string required

Example: qwen3-embedding-0.6b

Model id to use for embeddings. Must match a model your account can access.

user string optional

Example: user-1234

Optional end-user identifier to help with abuse monitoring.

Request: `/v1/embeddings`

Payload

Content type application/json

{
  "encoding_format": "float",
  "input": "hello world",
  "model": "qwen3-embedding-0.6b",
  "user": "user-1234"
}

cURL

curl -X POST \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer $DIGITALOCEAN_TOKEN" \
  -d '{"model":"qwen3-embedding-0.6b","input":["hello world","goodbye world"],"encoding_format":"float","user":"user-1234"}' \
  "https://inference.do-ai.run/v1/embeddings"

Python

import os
from pydo import Client

client = Client(token=os.environ.get("DIGITALOCEAN_TOKEN"))

resp = client.embeddings.create(
    model="qwen3-embedding-0.6b",
    input=["hello world", "goodbye world"],
    encoding_format="float",
    user="user-1234",
)

for item in resp.data:
    print(item.index, item.embedding[:8])

JavaScript

import { InferenceClient } from "@digitalocean/dots";

const client = new InferenceClient({
    apiKey: process.env.DIGITALOCEAN_TOKEN,
});

const resp = await client.embeddings.create({
    model: "qwen3-embedding-0.6b",
    input: ["hello world", "goodbye world"],
    encoding_format: "float",
    user: "user-1234",
});

for (const item of resp.data) {
    console.log(item.index, item.embedding.slice(0, 8));
}

Responses

200

Embeddings and usage for the given input or inputs, in order.

Response Headers

ratelimit-limit integer

The default limit on number of requests that can be made per hour and per minute. Current rate limits are 5000 requests per hour and 250 requests per minute.

ratelimit-remaining integer

The number of requests in your hourly quota that remain before you hit your request limit. See https://docs.digitalocean.com/reference/api/reference/#rate-limit for information about how requests expire.

ratelimit-reset integer

The time when the oldest request will expire. The value is given in Unix epoch time. See https://docs.digitalocean.com/reference/api/reference/#rate-limit for information about how requests expire.

Response Schema: application/json

data array of object required

One entry for each input string, in the same order.

Show child properties

embedding array | string required

Example: [0.0123,-0.0456,0.0001]

The embedding vector, or a base64-encoded string when the request set encoding_format to base64.

index integer required

Example: 0

Zero-based index of the corresponding input item (0 when input is a string).

object string, one of: embedding required

Example: embedding

The object type, which is always embedding.

model string required

Example: qwen3-embedding-0.6b

The embedding model that produced the vectors.

object string, one of: list required

Example: list

The object type, which is always the string list.

usage object required

Token usage for the embeddings request.

Show child properties

prompt_tokens integer required

Example: 6

Number of input tokens used for the embedding.

total_tokens integer required

Example: 6

Total billable tokens for the request.

401

Authentication failed due to invalid credentials.

Response Headers

ratelimit-limit integer

The default limit on number of requests that can be made per hour and per minute. Current rate limits are 5000 requests per hour and 250 requests per minute.

ratelimit-remaining integer

ratelimit-reset integer

The time when the oldest request will expire. The value is given in Unix epoch time. See https://docs.digitalocean.com/reference/api/reference/#rate-limit for information about how requests expire.

Response Schema: application/json

id string required

Example: not_found

A short identifier corresponding to the HTTP status code returned. For example, the ID for a response returning a 404 status code would be "not_found."

message string required

Example: The resource you were accessing could not be found.

A message providing additional information about the error, including details to help resolve it when possible.

request_id string optional

Example: 4d9d8375-3c56-4925-a3e7-eb137fed17e9

Optionally, some endpoints may include a request ID that should be provided when reporting bugs or opening support tickets to help identify the issue.

429

The API rate limit has been exceeded.

Response Headers

ratelimit-limit integer

The default limit on number of requests that can be made per hour and per minute. Current rate limits are 5000 requests per hour and 250 requests per minute.

ratelimit-remaining integer

ratelimit-reset integer

The time when the oldest request will expire. The value is given in Unix epoch time. See https://docs.digitalocean.com/reference/api/reference/#rate-limit for information about how requests expire.

Response Schema: application/json

id string required

Example: not_found

A short identifier corresponding to the HTTP status code returned. For example, the ID for a response returning a 404 status code would be "not_found."

message string required

Example: The resource you were accessing could not be found.

A message providing additional information about the error, including details to help resolve it when possible.

request_id string optional

Example: 4d9d8375-3c56-4925-a3e7-eb137fed17e9

Optionally, some endpoints may include a request ID that should be provided when reporting bugs or opening support tickets to help identify the issue.

500

There was a server error.

Response Headers

ratelimit-limit integer

The default limit on number of requests that can be made per hour and per minute. Current rate limits are 5000 requests per hour and 250 requests per minute.

ratelimit-remaining integer

ratelimit-reset integer

The time when the oldest request will expire. The value is given in Unix epoch time. See https://docs.digitalocean.com/reference/api/reference/#rate-limit for information about how requests expire.

Response Schema: application/json

id string required

Example: not_found

A short identifier corresponding to the HTTP status code returned. For example, the ID for a response returning a 404 status code would be "not_found."

message string required

Example: The resource you were accessing could not be found.

A message providing additional information about the error, including details to help resolve it when possible.

request_id string optional

Example: 4d9d8375-3c56-4925-a3e7-eb137fed17e9

Optionally, some endpoints may include a request ID that should be provided when reporting bugs or opening support tickets to help identify the issue.

default

There was an unexpected error.

Response Headers

ratelimit-limit integer

The default limit on number of requests that can be made per hour and per minute. Current rate limits are 5000 requests per hour and 250 requests per minute.

ratelimit-remaining integer

ratelimit-reset integer

The time when the oldest request will expire. The value is given in Unix epoch time. See https://docs.digitalocean.com/reference/api/reference/#rate-limit for information about how requests expire.

Response Schema: application/json

id string required

Example: not_found

A short identifier corresponding to the HTTP status code returned. For example, the ID for a response returning a 404 status code would be "not_found."

message string required

Example: The resource you were accessing could not be found.

A message providing additional information about the error, including details to help resolve it when possible.

request_id string optional

Example: 4d9d8375-3c56-4925-a3e7-eb137fed17e9

Optionally, some endpoints may include a request ID that should be provided when reporting bugs or opening support tickets to help identify the issue.

Response

200

{
  "data": [
    {
      "embedding": [
        0.0123,
        -0.0456,
        0.0001
      ],
      "index": 0,
      "object": "embedding"
    }
  ],
  "model": "qwen3-embedding-0.6b",
  "object": "list",
  "usage": {
    "prompt_tokens": 6,
    "total_tokens": 6
  }
}

401

{
  "id": "unauthorized",
  "message": "Unable to authenticate you."
}

429

{
  "id": "too_many_requests",
  "message": "API rate limit exceeded."
}

500

{
  "id": "server_error",
  "message": "Unexpected server-side error"
}

default

{
  "id": "example_error",
  "message": "some error message"
}

Embeddings

Endpoints

POST Create embedding

OAuth Authentication

Authenticate with a Bearer Authorization Header

Request Body: application/json

Request: /v1/embeddings

Responses

Response

We can't find any results for your search.

Request Body: `application/json`

Request: `/v1/embeddings`