Inference Reference

Validated on 20 Apr 2026 • Last edited on 8 May 2026

Inference provides a single control plane for managing inference workflows. It includes a Model Catalog where you can view available foundation models, including both DigitalOcean-hosted and third-party commercial models, compare model capabilities and pricing, use routing to match inference requests to the best-fit model, and run inference using serverless or dedicated deployments.

The DigitalOcean API

The DigitalOcean API lets you manage resources programmatically with standard HTTP requests. All actions available in the control panel are also available through the API.

  • Serverless Inference API: Interact directly with foundation models for chat completions, or generating image, audio and text-to-speech.

  • Dedicated Inference API: Manage your dedicated inference deployments. Dedicated Inference is available in public preview. You can opt in from the Feature Preview page.

  • GradientAI Platform API: Create, delete, and manage knowledge bases and generative AI agents. You can also use the API to add agent and function routes to agents, add data sources to knowledge bases, and start indexing jobs.

  • Agent Inference: Interact with agents using an agent-specific endpoint.

The DigitalOcean Command Line Client, doctl

doctl is the command-line interface for the DigitalOcean API. It supports most of the same actions available in the API and DigitalOcean Control Panel.

doctl gradient supports managing DigitalOcean Inference resources from the command line. See the doctl documentation or use doctl gradient --help for more information.

The Gradient Command Line Interface, gradient public

Use gradient, the CLI which comes with the Agent Development Kit, to build, test, and deploy agent workflows from within your development environments.

The Inference SDK

Use the official DigitalOcean Python client library for:

You can also use the official DigitalOcean TypeScript library or Go library.

The SDK will be deprecated in a future release.

The DigitalOcean MCP Server

The DigitalOcean MCP server lets you use natural language prompts to manage your DigitalOcean AI resources. You can:

  • Create, update, list, and delete Dedicated Inference endpoints
  • Interact with knowledge bases to retrieve relevant chunks, apply filters, and access indexed content for use in agent and retrieval workflows
  • Manage evaluation datasets, run evaluations and monitor agent deployments
  • Submit and retrieve batch inference jobs
  • Retrieve information from the model catalog.

All operations use argument-based input.

DigitalOcean MCP Servers

Use the DigitalOcean MCP server to manage your AI resources.

More Resources

Agent Evaluation Metrics

A list of available agent evaluation metrics and their definitions.

Agent Tracing Data

Understand the information agent tracing captures and how it helps you debug and optimize your agents.

Chunking Parameters

Reference for DigitalOcean Knowledge Bases chunking parameters, their recommendations, and their constraints across supported embeddings models.

We can't find any results for your search.

Try using different keywords or simplifying your search terms.