How to Use Built-in Tools

Validated on 27 Apr 2026 • Last edited on 27 Apr 2026

Inference provides a single control plane for managing inference workflows. It includes a Model Catalog where you can view available foundation models, including both DigitalOcean-hosted and third-party commercial models, compare model capabilities and pricing, use routing to match inference requests to the best-fit model, and run inference using serverless or dedicated deployments.

Built-in tools are server-side integrations that extend the model’s capabilities during inference. Instead of managing tool orchestration yourself, you add tool definitions to your API request for discovery, execution, and response integration automatically. We provide built-in tools for knowledge base retrieval, DigitalOcean MCP server, and web search.

Using knowledge base retrieval and DigitalOcean MCP server do not incur additional charges other than the standard per-token inference costs. You are charged $10 per 1000 requests for using web search with serverless inference.

Built-in tools work with both the Chat Completions API and the Responses API.

Retrieve Knowledge Base

Knowledge base retrieval lets the model query your private data sources during inference using retrieval-augmented generation (RAG). You add the knowledge_base_retrieval tool to your API request and the inference API handles retrieval and incorporates the results into the model’s response automatically.

Knowledge base retrieval lets the model query your private data sources using retrieval-augmented generation (RAG). You add knowledge base retrieval as a tool to your API request to use retrieval and incorporate the results into the model’s response.

To use knowledge base retrieval, send a POST request with your knowledge base ID. You can find the ID in the DigitalOcean Control Panel or by querying the API. Set tool_choice to auto to let the model decide when to query the knowledge base, or required to always query it before responding.

curl -X POST https://inference.do-ai.run/v1/chat/completions \
  -H "Authorization: Bearer $MODEL_ACCESS_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "openai-gpt-4o",
    "messages": [
      {
        "role": "user",
        "content": "What are some features of DigitalOcean AI platform?"
      }
    ],
    "tools": [
      {
        "type": "knowledge_base_retrieval",
        "knowledge_base_id": "<your-knowledge-base-id>"
      }
    ],
    "tool_choice": "auto",
    "stream": false,
    "max_tokens": 1024
  }'

For the full set of parameters, see the Serverless Inference API reference.

Use Model Context Protocol (MCP)

MCP servers expose tools that the model can call, such as fetching account data, managing schedules, or interacting with third-party APIs. The MCP built-in tool connects the model to remote Model Context Protocol (MCP) servers and orchestrates calls to them.

Connect to an Authenticated MCP Server

You can connect to authenticated MCP servers using bearer token authentication. The following example sends a Chat Completions request that connects to the DigitalOcean Accounts MCP server. Replace $DIGITALOCEAN_API_TOKEN with a valid DigitalOcean personal access token.

curl -X POST https://inference.do-ai.run/v1/chat/completions \
  -H "Authorization: Bearer $MODEL_ACCESS_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "openai-gpt-4o",
    "messages": [
      {
        "role": "user",
        "content": "Fetch my DigitalOcean account information and summarize it in 2 bullets."
      }
    ],
    "tools": [
      {
        "type": "mcp",
        "server_label": "digitalocean",
        "server_url": "https://accounts.mcp.digitalocean.com/mcp",
        "authorization": "Bearer $DIGITALOCEAN_API_TOKEN",
        "allowed_tools": ["account-get-information"]
      }
    ],
    "tool_choice": "required",
    "stream": false,
    "max_tokens": 512
  }'

The allowed_tools array restricts which tools from the MCP server the model can call. In this example, only the account-get-information tool is available. When omitted, the model can use any tool the server exposes. For the full set of MCP tool parameters, see the Serverless Inference API reference.

Connect to an Unauthenticated MCP Server

You can also connect to public MCP servers that do not require authentication. The following example sends a Responses API request:

curl -X POST https://inference.do-ai.run/v1/responses \
  -H "Authorization: Bearer $MODEL_ACCESS_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "openai-gpt-4o",
    "input": "Create a scheduling poll called Team Lunch with two time options for tomorrow at noon and the day after at noon.",
    "tools": [
      {
        "type": "mcp",
        "server_label": "timergy",
        "server_url": "https://api.timergy.com/mcp"
      }
    ],
    "tool_choice": "required",
    "stream": false,
    "max_output_tokens": 512
  }'

Add Web Search to Inference Request

Web search is a built-in tool that gives the model access to real-time web content during inference. When you add web search in your API request, the model decides when a search is needed, and the results are incorporated into the model’s response.

To enable web search, include a tool object with type set to web_search in the tools array of your request.

The following example sends a Responses API request with web search enabled:

curl -X POST https://inference.do-ai.run/v1/responses \
  -H "Authorization: Bearer $MODEL_ACCESS_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "openai-gpt-4o",
    "input": "What are the latest pricing changes for DigitalOcean Droplets?",
    "tools": [
      {
        "type": "web_search",
        "max_uses": 3,
        "max_results": 5
      }
    ],
    "max_output_tokens": 1024,
    "stream": false
  }'

When the model determines that a prompt benefits from web search, it searches for relevant information and incorporates the results into its response.

You can optionally limit how many searches the model performs per request with max_uses (1-5) and how many results each search returns with max_results (1-10, default 5).

When max_uses is reached, the model produces a final response using the results collected so far. For the full set of web search parameters, see the Serverless Inference API reference.

Use Built-in Tools With Dedicated Inference

You can use the built-in tool with dedicated inference. Provide the name of the dedicated inference and model slug that you can find using the API. For example:

curl -X POST https://inference.do-ai.run/v1/chat/completions \
  -H "Authorization: Bearer $MODEL_ACCESS_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "dedicated:<dedicated-inference-name>:<model_slug>",
    "messages": [
      {
        "role": "user",
        "content": "What features does DigitalOcean AI Platform offer?"
      }
    ],
    "tools": [
     {
        "type": "web_search",
        "max_uses": 2
     },
    "stream": false,
    "max_tokens": 1024
  }'

We can't find any results for your search.

Try using different keywords or simplifying your search terms.