Give Feedback

Inference Quickstart

Validated on 28 Apr 2026 • Last edited on 8 May 2026

Inference provides a single control plane for managing inference workflows. It includes a Model Catalog where you can view available foundation models, including both DigitalOcean-hosted and third-party commercial models, compare model capabilities and pricing, use routing to match inference requests to the best-fit model, and run inference using serverless or dedicated deployments.

Copy page as Markdown View page as Markdown

Browse the Model Catalog

To access Model Catalog, go to the DigitalOcean Control Panel and click Model Catalog under INFERENCE.
Browse the available foundation models. For more information about supported models and their capabilities, see our models page.
Click a model to open its model card and view details such as capabilities, pricing, and deployment options.
To test the model, click Model Playground in the top-right corner of the model card.

To learn more about browsing and filtering models, see Browse Models in Model Catalog.

Once you’ve browsed models, you can continue working with the following features:

Use Serverless Inference: Create model access keys and send API requests to foundation models without managing infrastructure.
Use Dedicated Inference: Host open-source or commercial LLMs on dedicated GPUs, scale them, and deploy them as inference endpoints.

Test a Model in the Model Playground

To access Model Playground, go to the DigitalOcean Control Panel, and click Serverless Inference under INFERENCE. Then, select the Model Playground tab.
Select a foundation model. For more information about supported models and their capabilities, see our models page.
Enter a prompt and optionally upload images. Then, review the model response.
Adjust settings such as temperature and token limits to test different outputs.

Integrate with Coding Agents

Coding agents, such as Codex and Claude Code, can use the inference endpoints https://inference.do-ai.run as a drop-in proxy to run inference requests on DigitalOcean.

Install the coding agent. Installation steps vary by provider.
Set up the model access key.
Configure the agent to use DigitalOcean.
Run the agent to use DigitalOcean inference endpoints.

For more information, see Use With Coding Agents.

Create a DigitalOcean Knowledge Base (Optional)

You can create a knowledge base, which is a collection of indexed content on a specific topic or domain. A knowledge base gives extra context during generation, such as your company’s internal documentation.

To create a knowledge base:

From the DigitalOcean Control Panel, in the top-right, click Create, and then under the DATA SERVICES section, click Knowledge Base to open the Create a knowledge base page.
In the Add data step, under the Choose your embeddings model section, click the Embeddings model dropdown list to select a model used for indexing your data. Choose one based on your data type and token budget. For options, see available embeddings models.
Under the Choosing your reranking model section, click the Reranking model (optional) dropdown list, and then select the available reranking model to re-score retrieved chunks and improve relevance. Alternatively, you can enable reranking later in the knowledge base settings. When reranking is enabled, you can also enable or disable it for individual retrieval queries. Reranking incurs charges per request when used.
Under the Add data sources section, choose at least one data source to upload or connect. You can add multiple sources, but to reduce indexing time and cost, use folders, buckets, or local files that contain only relevant content.

Knowledge bases support common text-based formats, including .csv, .pdf, .docx, .md, .html, .json, and .jsonl.

You can add any of the following data sources:
File Upload
Click Upload a file, and then under the Choose Files section, either click Upload or drag-and-drop files. To remove a file, click its trash icon. To add more files, click Upload more files.

For best performance, upload files no larger than 2 GB and fewer than 100 files at a time.

Spaces Bucket or Folder
Click Pull from a Spaces bucket or folder, and then choose one or more buckets or folders to index. To limit indexing to specific folders, click + on the left of the bucket to expand its contents and select the folders you want.

Knowledge bases index all supported files in the selected buckets and folders, regardless of privacy settings. For best results, use five or fewer buckets and store only indexing data in them.

Web or Sitemap URL
When you specify a website URL as a data source for your knowledge base, DigitalOcean uses a custom agent named DigitalOceanGradientAICrawler/1.0 to index the website content. The crawler indexes up to 5,500 pages and skips inaccessible or disallowed links to prevent excessively large indexing jobs.

Depending on the behavior you select, the crawler follows HTML links on the site, indexes text and certain image types, and ignores videos and navigation links. It respects the website’s robots.txt rules, including any Disallow directives or the wildcard *.

Select Add a web or sitemap URL, then choose either Seed URL or Sitemap URL.

For a seed URL, enter the public URL in the Seed URL field, then choose a crawl scope:
- Scoped crawls only the seed URL.
- URL and all linked pages in path crawls the seed URL and pages within the same path.
- URL and all linked pages in domain crawls pages in the same domain.
- Subdomains crawls the domain and its subdomains.
For a sitemap URL, enter the .xml sitemap in the Sitemap URL field, such as docs.digitalocean.com/sitemap.xml. Sitemap crawls only index URLs listed in the sitemap.

To index supported images and other media, select Index embedded media. To include header and footer navigation content, select Include headers and footers navigation links. These options can significantly increase token count.

To verify a crawl completed, re-add the same seed or sitemap URL as a new data source. If it shows zero tokens, the original crawl indexed all content and you can delete the duplicate.
Dropbox Folder
If you have not connected Dropbox, click Connect account next to Dropbox, then log in and authorize the connection.

Click Pull from a Dropbox folder, and then choose one or more folders to index. To limit indexing to specific folders, click + on the left of the folder to expand its contents and select the folders you want.

Amazon S3 Bucket or Folder
Select Pull from an AWS S3 bucket folder, and then enter the required credentials:
- Access Key ID: The IAM access key ID for the bucket or folder.
- Secret Key: The secret key for the access key ID.
- Bucket Name: The bucket’s name to index.
- Region: The AWS region, such as us-east-1 or eu-west-1.
On the right of the Region field, click + to add the S3 bucket folder.
Click Advanced Options to configure the chunking strategy. By default, data sources use section-based chunking. For guidance, see Chunking Strategy Best Practices.

Click Add selected data source, and then. either add another data source, or click Next step: Configure database.
In the Knowledge base name field, either enter a name for your knowledge base or use the auto-generated name.
In the Where should your knowledge base live? section, under the OpenSearch database options sub-section, select either Use existing to connect to an existing OpenSearch database or Create new to provision a new one.

Use Existing OpenSearch Database
If you choose Use existing, click the Select an OpenSearch database dropdown list, and then select the database you want to use. If it already contains data, it may limit how much new data you can index. You only pay for successfully indexed data.

Create a New OpenSearch Database
If you choose Create new, under the Choose a datacenter region section, select the default datacenter region for your knowledge base, or click the Additional datacenter regions dropdown list, to choose a different one.

For agents, choose the same region as your agents to reduce latency. Most Agent Platform infrastructure is in TOR1.

New databases use the smallest size that fits your data. For embeddings, we recommend allocating about twice your original dataset size.

Under the VPC Network section, choose the VPC network you want to use.

Click Next step: Review and create.
Under the Final Details section, click the Select a project dropdown list to choose the project your knowledge base should live in.
In the Tags field, optionally add tags to help organize and filter your knowledge base. Tags can include letters, numbers, colons, dashes, and underscores.
Under the Review section, review the selected embeddings model, its token price, the estimated size and number of data sources, and the OpenSearch database configuration. Then, click Create knowledge base (or Create Knowledge Base and database).

After you create a knowledge base, you can optionally test your knowledge base’s retrieval or test retrieval and responses in RAG Playground.

After the knowledge base is created, you can also attach it to a new or existing agent.

Create an Agent

Create an agent to define your instructions, model, workspace, and optional knowledge base attachments.

To create an agent:

From the Control Panel, in the top-right, click Create, and then select Agents to open the Create an agent page.
Under the Configure your agent section, in the Agent name field, either enter a name for your agent or use the auto-generated name.
Under the Agent instructions sub-section, add your agent instructions. You can also start with one of the ready-to-use templates. Agent instructions can be up to 10,000 characters long.
Click the Select a model provider dropdown menu, and then choose the foundation model that best fits your use case. Token cost varies by provider and model. For supported models and their capabilities, see Available Models and Pricing. Depending on the provider and model you choose, you may also be able to configure model-specific options, such as reasoning level or other response settings.

Review and accept the model’s Terms and Conditions checkbox.
Under the Where should your agent live? section, choose an existing workspace or create a new one. To create a new workspace, enter a Workspace name and optionally add a Workspace description.
Under the Optional Configuration section, in the Add knowledge bases sub-section, select any knowledge bases you want to attach to your agent for retrieval-augmented generation (RAG).
Optionally, click Connect agent to VPC network to add the agent to a VPC network if you want it to communicate with private resources in your network or keep traffic isolated from the public internet. In the VPC network dropdown menu, select the VPC network you want to use.
Under the Final Details section, choose the project your agent should live in.
Under the Tags sub-section, optionally add tags to help organize and filter your agent. Tags can include letters, numbers, colons, dashes, and underscores.
Under the Estimated cost summary sub-section, review the configuration and estimated cost, and then click Create Agent.

Once you’ve set up an agent, you can:

Test your agent’s responses and adjust its model settings and configuration in the Agent Playground.
Test how a selected model answers questions using content retrieved from a knowledge base in the RAG Playground.
Add rules and constraints to help control agent responses with Guardrails.

Interact With Your Agent

After you create an agent, you can interact with it by sending requests to its endpoint. First, create an access key for authentication, then use the agent endpoint and key to submit queries and receive responses.

From the Control Panel, in the left menu, click Agent Platform under INFERENCE. Select the Workspaces tab, and then click the workspace that has the agent you want to interact with.
Under the Agents tab, find and click the agent you want to use, and then click its Settings tab.
Under the Endpoint Access Keys sub-section, click Create Key to open the Create Agent Access Key window.
Under the Key name field, enter a name for your key, and then click Create. Copy the key, and then save it in a secure location.

In the Agent Essentials section of the agent’s Overview tab, copy the agent’s endpoint from the ENDPOINT sub-section. Then, use the endpoint and the access key to send requests and generate responses to user queries.

The following cURL and Python OpenAI examples show how to use the agent’s endpoint:

cURL example

The cURL example uses environment variables to store the agent’s endpoint ($AGENT_ENDPOINT) and access key ($AGENT_ACCESS_KEY). To return retrieval information about how the response was generated, such as the knowledge base data, guardrails, and functions used, set the include_retrieval_info, include_guardrails_info, and the include_functions_info parameters to true in the request body are set to true.

curl -i \
  -X POST \
  $AGENT_ENDPOINT/api/v1/chat/completions \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer $AGENT_ACCESS_KEY" \
  -d '{
    "messages": [
      {
        "role": "user",
        "content": "What is the capital of France?"
      }
    ],
    "stream": false,
    "include_functions_info": true,
    "include_retrieval_info": true,
    "include_guardrails_info": true
  }'

Python OpenAI example

use-agent-endpoint-key.py

# Install OS, JSON, and OpenAI libraries.
import os
import json
from openai import OpenAI

# Set your agent endpoint and access key as environment variables in your OS.
agent_endpoint = os.getenv("agent_endpoint") + "/api/v1/" 
agent_access_key = os.getenv("agent_access_key")

if __name__ == "__main__":
    client = OpenAI(
        base_url = agent_endpoint,
        api_key = agent_access_key,
    )

    response = client.chat.completions.create(
        model = "n/a",
        messages = [{"role": "user", "content": "Can you provide the name of France's capital in JSON format."}],
        extra_body = {"include_retrieval_info": True}
    )

# Prints response's content and retrieval object.
    for choice in response.choices:
        print(choice.message.content)
    
    response_dict = response.to_dict()

    print("\nFull retrieval object:")
    print(json.dumps(response_dict["retrieval"], indent=2))

Embed a Chatbot

You can make your agent available through an embeddable chatbot interface.

To embed a chatbot:

From the Control Panel, in the left menu, click Agent Platform under INFERENCE. Select the Workspaces tab, and then click the workspace that has the agent you want to use for your chatbot.
Under the Agents tab, find and click the agent you want to use.
In the Agent Essentials section of the agent’s Overview tab, click Edit in the ENDPOINT sub-section to open the Set endpoint availability to public window.
Set your agent endpoint to Public so outside applications can access it without requiring the agent access key, and then click Save.

This adds the CHATBOT embed code below the ENDPOINT section. The chatbot embed code is an HTML <script> element that you can copy and paste into your application or website.
Copy the HTML chatbot embed code, and then paste it into your application or website.