Inference API Reference

Validated on 20 Apr 2026 • Last edited on 27 Apr 2026

Inference provides a single control plane for managing inference workflows. It includes a Model Catalog where you can view available foundation models, including both DigitalOcean-hosted and third-party commercial models, compare model capabilities and pricing, use routing to match inference requests to the best-fit model, and run inference using serverless or dedicated deployments.

The Inference API endpoints are organized into the following groups:

  • Dedicated Inference (13 endpoints): Dedicated Inference delivers scalable production-grade LLM hosting on DigitalOcean. Create, list, get, update, and delete Dedicated Inference instances; manage accelerators, CA certificate, sizes, GPU model config, and access tokens.
  • Serverless Inference (7 endpoints): DigitalOcean Gradient™ AI Agentic Cloud allows access to serverless inference models. You can access models by providing an inference key.
  • Embeddings (1 endpoints): Text embedding vectors via POST /v1/embeddings on the Serverless Inference.

We can't find any results for your search.

Try using different keywords or simplifying your search terms.