What is serverless inference and how it differs from dedicated inference.
Use Serverless Inference
Validated on 28 Apr 2026 • Last edited on 14 May 2026
Inference provides a single control plane for managing inference workflows. It includes a Model Catalog where you can view available foundation models, including both DigitalOcean-hosted and third-party commercial models, compare model capabilities and pricing, use routing to match inference requests to the best-fit model, and run inference using serverless or dedicated deployments.
Get Started
Synchronous and asynchronous API endpoints for serverless inference.
Create, scope, and manage model access keys for foundation models, inference routers, and batch inference, with VPC restrictions and team-owner visibility.
How to retrieve models available for serverless inference.
Generate Chat Completions
Send prompts and use reasoning with the Chat Completions API.
Send prompts with the Responses API.
Use prompt caching with the Chat Completions and Responses API.
Use reasoning with the Chat Completions and Responses API.
Generate Images, Audio, Videos, and Text-to-Speech
Generate or edit images from text prompts.
Process and generate content across multiple data types, including images, audio, video, and text using multimodal models.
Generate image, audio, or text-to-speech using fal models.