Inference API Reference
Validated on 20 Apr 2026 • Last edited on 27 Apr 2026
Inference provides a single control plane for managing inference workflows. It includes a Model Catalog where you can view available foundation models, including both DigitalOcean-hosted and third-party commercial models, compare model capabilities and pricing, use routing to match inference requests to the best-fit model, and run inference using serverless or dedicated deployments.
The Inference API endpoints are organized into the following groups:
- Dedicated Inference (13 endpoints): Dedicated Inference delivers scalable production-grade LLM hosting on DigitalOcean. Create, list, get, update, and delete Dedicated Inference instances; manage accelerators, CA certificate, sizes, GPU model config, and access tokens.
- Serverless Inference (7 endpoints): DigitalOcean Gradient™ AI Agentic Cloud allows access to serverless inference models. You can access models by providing an inference key.
- Embeddings (1 endpoints): Text embedding vectors via
POST /v1/embeddingson the Serverless Inference.