Give Feedback

Inference API Reference

Validated on 20 Apr 2026 • Last edited on 14 May 2026

Inference provides a single control plane for managing inference workflows. It includes a Model Catalog where you can view available foundation models, including both DigitalOcean-hosted and third-party commercial models, compare model capabilities and pricing, use routing to match inference requests to the best-fit model, and run inference using serverless or dedicated deployments.

Copy page as Markdown View page as Markdown

The Inference API endpoints are organized into the following groups:

Dedicated Inference (13 endpoints): Dedicated Inference delivers scalable production-grade LLM hosting on DigitalOcean. Create, list, get, update, and delete Dedicated Inference instances; manage accelerators, CA certificate, sizes, GPU model config, and access tokens.
Serverless Inference (7 endpoints): DigitalOcean Gradient™ AI Agentic Cloud allows access to serverless inference models. You can access models by providing an inference key.
Embeddings (1 endpoints): Text embedding vectors via POST /v1/embeddings on the Serverless Inference.
Batch Inference (7 endpoints): Batch Inference is an asynchronous processing capability designed to help you scale high-volume AI projects more efficiently. Ideal for heavy-duty workloads like large-scale data classification, evaluations, and content enrichment, you can submit thousands or even millions of requests in a single job with a guaranteed results window of 24 hours. By utilizing off-peak GPU capacity, Batch Inference provides high-performance LLM access at a significantly reduced price point compared to standard synchronous APIs, making it a cost-effective choice for non-interactive workloads.
GradientAI Platform (108 endpoints): The API lets you build GPU-powered AI agents with pre-built or custom foundation models, function and agent routes, and RAG pipelines with knowledge bases.
Agent Inference (1 endpoints): DigitalOcean Gradient™ AI Agentic Cloud allows you to create multi-agent workflows to power your AI applications. This allows developers to integrate agents into your AI applications.

Inference API Reference

We can't find any results for your search.