Inference API Reference

Validated on 20 Apr 2026 • Last edited on 14 May 2026

Inference provides a single control plane for managing inference workflows. It includes a Model Catalog where you can view available foundation models, including both DigitalOcean-hosted and third-party commercial models, compare model capabilities and pricing, use routing to match inference requests to the best-fit model, and run inference using serverless or dedicated deployments.

The Inference API endpoints are organized into the following groups:

  • Dedicated Inference (13 endpoints): Dedicated Inference delivers scalable production-grade LLM hosting on DigitalOcean. Create, list, get, update, and delete Dedicated Inference instances; manage accelerators, CA certificate, sizes, GPU model config, and access tokens.
  • Serverless Inference (7 endpoints): DigitalOcean Gradient™ AI Agentic Cloud allows access to serverless inference models. You can access models by providing an inference key.
  • Embeddings (1 endpoints): Text embedding vectors via POST /v1/embeddings on the Serverless Inference.
  • Batch Inference (7 endpoints): Batch Inference is an asynchronous processing capability designed to help you scale high-volume AI projects more efficiently. Ideal for heavy-duty workloads like large-scale data classification, evaluations, and content enrichment, you can submit thousands or even millions of requests in a single job with a guaranteed results window of 24 hours. By utilizing off-peak GPU capacity, Batch Inference provides high-performance LLM access at a significantly reduced price point compared to standard synchronous APIs, making it a cost-effective choice for non-interactive workloads.
  • GradientAI Platform (108 endpoints): The API lets you build GPU-powered AI agents with pre-built or custom foundation models, function and agent routes, and RAG pipelines with knowledge bases.
  • Agent Inference (1 endpoints): DigitalOcean Gradient™ AI Agentic Cloud allows you to create multi-agent workflows to power your AI applications. This allows developers to integrate agents into your AI applications.

We can't find any results for your search.

Try using different keywords or simplifying your search terms.