How to Use Server-Side Tools

Validated on 10 Jun 2026 • Last edited on 10 Jun 2026

Inference provides a single control plane for managing inference workflows. It includes a Model Catalog where you can view available foundation models, including both DigitalOcean-hosted and third-party commercial models, compare model capabilities and pricing, use routing to match inference requests to the best-fit model, and run inference using serverless or dedicated deployments.

We support server-side integrations that extend the model’s capabilities during inference. Instead of managing tool orchestration yourself, you add tool definitions to your API request for discovery, execution, and response integration automatically. We provide server-side tools for knowledge base retrieval, the DigitalOcean MCP server, and web search.

Using knowledge base retrieval and the DigitalOcean MCP server does not incur additional charges other than the standard per-token inference costs. You are charged $10 per 1000 requests for using web search with serverless inference.

Server-side tools work with both the Chat Completions API and the Responses API. For tool search, you can use the Messages API for Anthropic models.

Retrieve Knowledge Base

Knowledge base retrieval lets the model query your private data sources during inference using retrieval-augmented generation (RAG). You add the knowledge_base_retrieval tool to your API request and the inference API handles retrieval and incorporates the results into the model’s response automatically.

To use knowledge base retrieval, send a POST request with your knowledge base ID. You can find the ID in the DigitalOcean Control Panel or by querying the API. Set tool_choice to auto to let the model decide when to query the knowledge base, or required to always query it before responding. For example, use the Chat Completions API with knowledge base retrieval:

curl -X POST https://inference.do-ai.run/v1/chat/completions \
  -H "Authorization: Bearer $MODEL_ACCESS_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "openai-gpt-4o",
    "messages": [
      {
        "role": "user",
        "content": "What are some features of DigitalOcean Inference?"
      }
    ],
    "tools": [
      {
        "type": "knowledge_base_retrieval",
        "knowledge_base_id": "<your-knowledge-base-id>"
      }
    ],
    "tool_choice": "auto",
    "stream": false,
    "max_tokens": 1024
  }'

For the full set of parameters, see the Serverless Inference API reference. The response looks like the following:

{
    "choices": [
        {
            "finish_reason": "stop",
            "index": 0,
            "logprobs": null,
            "message": {
                "annotations": [
                    {
                        "type": "tool_use",
                        "tool_use": {
                            "name": "knowledge_base_retrieval",
                            "call_id": "call_y6aBCI2IrnS8ZgPR6DKjXl9T",
                            "arguments": "{\"query\":\"features of DigitalOcean Inference\"}",
                            "status": "completed",
                            "output": "{\"knowledge_base_id\":\"e7651dee-da73-11ef-bf8f-4e013e2ddde4\",\"query\":\"features of DigitalOcean Inference\",\"results\":[{\"metadata\":{\"chunk_category\":\"CompositeElement\",\"ingested_timestamp\":\"2026-05-11T18:15:46.532085+00:00\",\"item_name\":\"https://docs.digitalocean.com/\",\"page_number\":null},\"text_content\":\"### 1 May 2026 [ ](https://docs.digitalocean.com/#1-may-2026) * The following DeepSeek model is now available on DigitalOcean Inference for [serverless inference](https://docs.digitalocean.com/products/inference/how-to/use-serverless-inference/), [Agent Development Kit](https://docs.digitalocean.com/products/inference/how-to/build-agents-using-adk/) and [agents](https://docs.digitalocean.com/products/inference/how-to/create-agents/): For more information, see the [Available Models page](https://docs.digitalocean.com/products/inference/details/models/).\"},{\"metadata\":{\"chunk_category\":\"CompositeElement\",\"ingested_timestamp\":\"2026-05-11T18:15:46.532085+00:00\",\"item_name\":\"https://docs.digitalocean.com/\",\"page_number\":null},\"text_content\":\"### 5 May 2026 [ ](https://docs.digitalocean.com/#5-may-2026) ....\"}],\"total_results\":3}"
                        }
                    }
                ],
                "content": "DigitalOcean Inference includes several features designed to support serverless inference and facilitate the development and deployment of AI models and agents. ...",
                "reasoning_content": null,
                "refusal": null,
                "role": "assistant"
            }
        }
    ],
    ...
    }
}

Use the Responses API with knowledge base retrieval:

curl -X POST https://inference.do-ai.run/v1/responses \
  -H "Authorization: Bearer $MODEL_ACCESS_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "deepseek-v4-pro",
    "input": "What is actions infrastructure? Answer using the knowledge base.",
    "tools": [
      {
        "type": "knowledge_base_retrieval",
        "knowledge_base_id": "09d65da1-5225-11f1-b074-4e013e2ddde4"
      }
    ],
    "stream": false,
    "max_output_tokens": 1024
  }'

The response looks similar to the following:

{"background":false,"completed_at":0,"created_at":1779045957,"error":{"code":"","message":""},"frequency_penalty":0,"id":"resp_b999e07cebb3dd7a","incomplete_details":{"reason":""},"instructions":{"OfInputItemList":null,"OfString":""},"max_output_tokens":1024,"max_tool_calls":0,"metadata":null,"model":"deepseek-v4-pro","object":"response","output":[{"arguments":"{\"query\": \"What is actions infrastructure?\"}","call_id":"chatcmpl-tool-8e9ce86361b0fe64","name":"knowledge_base_retrieval","status":"completed","type":"function_call"},{"call_id":"chatcmpl-tool-8e9ce86361b0fe64","output":"{\"knowledge_base_id\":\"09d65da1-5225-11f1-b074-4e013e2ddde4\",\"query\":\"What is actions infrastructure?\",\"results\":[{\"metadata\":{\"chunk_category\":\"CompositeElement\",\"ingested_timestamp\":\"2026-05-17T19:23:20.695776+00:00\",\"item_name\":\"Actions Gateway - HLD.pdf\",\"page_number\":45},\"text_content\":\"Open questions\\n\\nMARS Vault accepting non-MARS credentials. This HLD assumes MARS Vault is the credential store for standalone-surface end-user OAuth tokens as well as MARS-surface operator-managed handles - replacing the bespoke End-User Token Store, the per-customer DEK scheme, and the direct KMS dependency. The assumption is load-bearing: the standalone OAuth Portal callback writes provider tokens to Vault keyed.

Use Model Context Protocol (MCP)

MCP servers expose tools that the model can call, such as fetching account data, managing schedules, or interacting with third-party APIs. The MCP built-in tool connects the model to remote Model Context Protocol (MCP) servers and orchestrates calls to them.

Connect to an Authenticated MCP Server

You can connect to authenticated MCP servers using bearer token authentication. The following example sends a Chat Completions request that connects to the DigitalOcean Accounts MCP server. Replace $DIGITALOCEAN_API_TOKEN with a valid DigitalOcean personal access token.

curl -X POST https://inference.do-ai.run/v1/chat/completions \
  -H "Authorization: Bearer $MODEL_ACCESS_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "openai-gpt-4o",
    "messages": [
      {
        "role": "user",
        "content": "Fetch my DigitalOcean account information and summarize it in 2 bullets."
      }
    ],
    "tools": [
      {
        "type": "mcp",
        "server_label": "digitalocean",
        "server_url": "https://accounts.mcp.digitalocean.com/mcp",
        "authorization": "Bearer $DIGITALOCEAN_API_TOKEN",
        "allowed_tools": ["account-get-information"]
      }
    ],
    "tool_choice": "required",
    "stream": false,
    "max_tokens": 512
  }'

The allowed_tools array restricts which tools from the MCP server the model can call. In this example, only the account-get-information tool is available. When omitted, the model can use any tool the server exposes. For the full set of MCP tool parameters, see the Serverless Inference API reference. The response looks like the following:

{
    "choices": [
        {
            "finish_reason": "stop",
            "index": 0,
            "logprobs": null,
            "message": {
                "annotations": [
                    {
                        "type": "tool_use",
                        "tool_use": {
                            "name": "digitalocean__account-get-information",
                            "call_id": "call_xasb9Xk1HAfT564P3bteZZTT",
                            "arguments": "{}",
                            "status": "completed",
                            "output": "{\n  \"droplet_limit\": 100,\n  \"floating_ip_limit\": 75,\n  \"reserved_ip_limit\": 75,\n  \"volume_limit\": 5000,\n  \"email\": \"[email protected]\",\n  \"name\": \"dev-sammy\",\n  \"uuid\": \"de55ee97-21ab-452d-aaf0-d4046480xxxx\",\n  \"email_verified\": true,\n  \"status\": \"active\",\n  \"team\": {\n    \"name\": \"My Team\",\n    \"uuid\": \"de55ee97-21ab-452d-aaf0-d4046480xxxx\"\n  }\n}"
                        }
                    }
                ],
                "content": "- Your account (\"dev-sammy\") is active with email \"[email protected]\", which is verified. You are part of \"My Team\" with a UUID of \"de55ee97-21ab-452d-aaf0-d4046480xxxx\".\n- You have a resource allocation limit of 100 droplets, 75 floating IPs, 75 reserved IPs, and 5000 volumes.",
                "reasoning_content": null,
                "refusal": null,
                "role": "assistant"
            }
        }
    ],
....    }
}

Connect to an Unauthenticated MCP Server

You can also connect to public MCP servers that do not require authentication. The following example sends a Responses API request:

curl -X POST https://inference.do-ai.run/v1/responses \
  -H "Authorization: Bearer $MODEL_ACCESS_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "openai-gpt-4o",
    "input": "Create a scheduling poll called Team Lunch with two time options for tomorrow at noon and the day after at noon.",
    "tools": [
      {
        "type": "mcp",
        "server_label": "timergy",
        "server_url": "https://api.timergy.com/mcp"
      }
    ],
    "tool_choice": "required",
    "stream": false,
    "max_output_tokens": 512
  }'

The response looks like the following:

{
...
  "model": "openai-gpt-4o",
  "object": "response",
  "output": [
    {
      "arguments": "{\"autoFinalize\":false,\"creatorName\":\"Assistant for Team\",\"deadline\":\"2026-04-08T12:00:00-05:00\",\"description\":\"Scheduling poll for a team lunch\",\"invitees\":[],\"location\":\"Office Cafeteria\",\"options\":[{\"end\":\"2026-04-10T13:00:00-05:00\",\"start\":\"2026-04-10T12:00:00-05:00\"},{\"end\":\"2026-04-11T13:00:00-05:00\",\"start\":\"2026-04-11T12:00:00-05:00\"}],\"title\":\"Team Lunch\"}",
      "call_id": "call_uallF71f2THNYsEvZUubSUhQ",
      "name": "timergy__create_poll",
      "status": "completed",
      "type": "function_call"
    },
    {
      "call_id": "call_uallF71f2THNYsEvZUubSUhQ",
      "output": "{\n  \"pollId\": \"fdc8110b-274d-45b4-b791-0149f6cfc4bc\",\n  \"title\": \"Team Lunch\",\n  \"url\": \"https://timergy.com/en/polls/fdc8110b-274d-45b4-b791-0149f6cfc4bc\",\n  \"passphrase\": \"horse-sword-thumb\",\n  \"options\": [\n    {\n      \"id\": \"c8879755-9136-497d-aedc-81b50fafbb13\",\n      \"start\": \"2026-04-10T17:00:00.000Z\",\n      \"end\": \"2026-04-10T18:00:00.000Z\",\n      \"label\": null\n    },\n    {\n      \"id\": \"58c474c6-12b4-42d5-a96e-c6c977d8a3b2\",\n      \"start\": \"2026-04-11T17:00:00.000Z\",\n      \"end\": \"2026-04-11T18:00:00.000Z\",\n      \"label\": null\n    }\n  ],\n  \"expiresAt\": \"2026-04-21T18:00:00.000Z\",\n  \"autoFinalize\": false,\n  \"inviteesSent\": 0,\n  \"note\": \"Share the URL with participants. The passphrase is saved for finalization. Assistant for Team's \\\"yes\\\" votes have been auto-submitted.\"\n}",
      "status": "completed",
      "type": "function_call_output"
    },
    {
      "content": [
        {
          "annotations": [],
          "logprobs": [],
          "text": "The scheduling poll \"Team Lunch\" has been created. You can share the following URL with participants to vote:\n\n**Poll URL:** [Team Lunch Poll](https://timergy.com/en/polls/fdc8110b-274d-45b4-b791-0149f6cfc4bc)\n\nFor admin access and to finalize the poll, you can use the passphrase:\n\n**Passphrase:** `horse-sword-thumb`\n\nThe poll includes two time slot options:\n- April 10, 2026, from 12:00 PM to 1:00 PM (local time)\n- April 11, 2026, from 12:00 PM to 1:00 PM (local time)",
          "type": "output_text"
        }
      ],
...
  "tool_choice": "auto",
  "tools": [
    {
      "type": "mcp",
      "server_label": "timergy",
      "server_url": "https://api.timergy.com/mcp"
    }
  ],
...
  }
}

Add Web Search to Inference Request

Web search is a built-in tool that gives the model access to real-time web content during inference. When you add web search in your API request, the model decides when a search is needed, and the results are incorporated into the model’s response.

To enable web search, include a tool object with type set to web_search in the tools array of your request.

The following example sends a Responses API request with web search enabled:

curl -X POST https://inference.do-ai.run/v1/responses \
  -H "Authorization: Bearer $MODEL_ACCESS_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "openai-gpt-4o",
    "input": "What are the latest pricing changes for DigitalOcean Droplets?",
    "tools": [
      {
        "type": "web_search",
        "max_uses": 3,
        "max_results": 5
      }
    ],
    "max_output_tokens": 1024,
    "stream": false
  }'

When the model determines that a prompt benefits from web search, it searches for relevant information and incorporates the results into its response.

You can optionally limit how many searches the model performs per request with max_uses (1-5) and how many results each search returns with max_results (1-10, default 5).

When max_uses is reached, the model produces a final response using the results collected so far. For the full set of web search parameters, see the Serverless Inference API reference.

The response looks similar to the following:

{
  ...
  "output": [
    {
      "action": {
        "queries": [
          "DigitalOcean AI platform features"
        ],
        "query": "DigitalOcean AI platform features",
        "type": "search"
      },
      "id": "ws_call_t7eyYNbAWQOcEln1Ns2TxuOv",
      "status": "completed",
      "type": "web_search_call"
    },
    {
      "content": [
        {
          "annotations": [
            {
              "end_index": 1501,
              "start_index": 1439,
              "title": "DigitalOcean AI Platform Features | DigitalOcean Documentation",
              "type": "url_citation",
              "url": "https://docs.digitalocean.com/products/inference/details/features"
            },
            {
              "end_index": 1800,
              "start_index": 1729,
              "title": "DigitalOcean Inference Details | DigitalOcean Documentation",
              "type": "url_citation",
              "url": "https://docs.digitalocean.com/products/inference/details"
            },
            {
              "end_index": 2085,
              "start_index": 2028,
              "title": "Agent Platform | Build AI Agents with DigitalOcean",
              "type": "url_citation",
              "url": "https://www.digitalocean.com/products/inference/platform"
            }
          ],
          "logprobs": [],
          "text": "The DigitalOcean AI Platform offers a variety of features:\n\n1. **AI Agent Development**: Build fully-managed AI ...",
          "type": "output_text"
        }
      ],
....
  }
}

Tool search enables searching and loading of tools on demand in the model’s context in agentic workflows. Use tool search with the Messages API for Anthropic models and the Responses API for OpenAI models:

Include a tool search tool by either using regex or BM25 as the type in your tools array:

  • Regex (tool_search_tool_regex_20251119): Allows Claude to construct regex patterns to search for tools using Python re.search() syntax.
  • BM25 (tool_search_tool_bm25_20251119): Allows Claude to use natural language queries to search for tools.

Then, set defer_loading: true on tools that should not load immediately. The model calls the tool search tool when it needs them. Both tool search variants search tool names, descriptions, argument names, and argument descriptions. Note the following about tool search:

  • The tool search tool itself must not have "defer_loading": true.
  • Tools without defer_loading load into context immediately while tools with "defer_loading": true load only when Claude discovers them through search.
  • For best performance, keep your 3-5 most frequently used tools non-deferred.

The following example sends a Messages API request with regex tool search enabled:

curl -X POST https://inference.do-ai.run/v1/messages \
  -H "x-api-key: $MODEL_ACCESS_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "anthropic-claude-opus-4.8",
    "max_tokens": 2048,
    "messages": [
      {
        "role": "user",
        "content": "What is the weather in zip code 94107?"
      }
    ],
    "tools": [
      {
        "type": "tool_search_tool_regex_20251119",
        "name": "tool_search_tool_regex"
      },
      {
        "name": "get_weather_by_zip",
        "description": "Return current weather conditions for a US zip code.",
        "input_schema": {
          "type": "object",
          "properties": {
            "zip_code": {"type": "string"},
            "unit": {
              "type": "string",
              "enum": ["celsius", "fahrenheit"]
            }
          },
          "required": ["zip_code"]
        },
        "defer_loading": true
      },
      {
        "name": "search_files",
        "description": "Search through files in the workspace",
        "input_schema": {
          "type": "object",
          "properties": {
            "query": {"type": "string"},
            "file_types": {
              "type": "array",
              "items": {"type": "string"}
            }
          },
          "required": ["query"]
        },
        "defer_loading": true
      }
    ]
  }'

The response includes additional block types before any client tool call:

  • server_tool_use: Indicates that Claude is calling the tool search tool.
  • tool_search_tool_result: Contains search results with a nested tool_search_tool_search_result object.
  • tool_use: Claude calling a discovered tool.
  • tool_references: Points to discovered tools.

The response looks similar to the following:

{
  "role": "assistant",
  "content": [
    {
      "type": "text",
      "text": "I'll search for tools to help with the weather information."
    },
    {
      "type": "server_tool_use",
      "id": "srvtoolu_01ABC123",
      "name": "tool_search_tool_regex",
      "input": {
        "query": "weather"
      }
    },
    {
      "type": "tool_search_tool_result",
      "tool_use_id": "srvtoolu_01ABC123",
      "content": {
        "type": "tool_search_tool_search_result",
        "tool_references": [{ "type": "tool_reference", "tool_name": "get_weather_by_zip" }]
      }
    },
    {
      "type": "text",
      "text": "I found a weather tool. Let me get the weather for zip code 94107."
    },
    {
      "type": "tool_use",
      "id": "toolu_01XYZ789",
      "name": "get_weather_by_zip",
      "input": { "zip_code": "94107", "unit": "fahrenheit" }
    }
  ],
  "stop_reason": "tool_use"
}

Tool search tool usage is tracked in the usage object in teh response:

{
  "usage": {
    "input_tokens": 1024,
    "output_tokens": 256,
    "server_tool_use": {
      "tool_search_requests": 2
    }
  }
}

For more information on MCP integration and best practices, see the Anthropic tool search documentation.

Only GPT-5.4 and later models support tool search. To enable tool search, add a tool object with "type": "tool_search" to the tools array. Then, mark tools to defer with "defer_loading": true. The following example sends a Responses API request with hosted tool search enabled:

curl -X POST https://inference.do-ai.run/v1/responses \
  -H "Authorization: Bearer $MODEL_ACCESS_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "openai-gpt-5.5",
    "input": "Compare the current weather in zip code 94107 and 10001.",
    "tools": [
      {
        "type": "namespace",
        "name": "weather",
        "description": "Weather lookup tools for US zip codes.",
        "tools": [
          {
            "type": "function",
            "name": "get_weather_by_zip",
            "description": "Return current weather conditions for a US zip code.",
            "defer_loading": true,
            "parameters": {
              "type": "object",
              "properties": {
                "zip_code": { "type": "string" }
              },
              "required": ["zip_code"],
              "additionalProperties": false
            }
          }
        ]
      },
      {
        "type": "tool_search"
      }
    ],
    "parallel_tool_calls": false
  }'

For MCP servers, set defer_loading: true on the MCP server tool definition (or on individual tools within the server). For maximum token savings, group deferred functions into namespaces or MCP servers with clear, high-level descriptions so that the model can effectively search and load only the relevant functions. For other best practices, see the OpenAI tool search documentation.

If the model needs a deferred tool, the response includes two additional output items before the eventual function call: tool_search_call which records the hosted search step, and tool_search_output, which contains the loaded subset that becomes callable. The response looks similar to the following:

[
  {
    "type": "tool_search_call",
    "execution": "server",
    "call_id": null,
    "status": "completed",
    "arguments": {
      "paths": ["weather"]
    }
  },
  {
    "type": "tool_search_output",
    "execution": "server",
    "call_id": null,
    "status": "completed",
    "tools": [
  ....
  },
  {
    "type": "function_call",
    "name": "get_weather_by_zip",
    "namespace": "weather",
    "call_id": "call_abc123",
    "arguments": "{\"zip_code\":\"94107\"}"
  }
]

Use Server-Side Tools With Dedicated Inference

You can use server-side tools with dedicated inference. Provide the name of the dedicated inference and model slug that you can find using the API. For example:

curl -X POST https://inference.do-ai.run/v1/chat/completions \
  -H "Authorization: Bearer $MODEL_ACCESS_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "dedicated:<dedicated-inference-name>:<model_slug>",
    "messages": [
      {
        "role": "user",
        "content": "What features does DigitalOcean Inference offer?"
      }
    ],
    "tools": [
      {
        "type": "web_search",
        "max_uses": 2
      }
    ],
    "stream": false,
    "max_tokens": 1024
  }'

We can't find any results for your search.

Try using different keywords or simplifying your search terms.