What is serverless inference and how it differs from dedicated inference.
Inference How-Tos
Generated on 28 Apr 2026
Inference provides a single control plane for managing inference workflows. It includes a Model Catalog where you can view available foundation models, including both DigitalOcean-hosted and third-party commercial models, compare model capabilities and pricing, use routing to match inference requests to the best-fit model, and run inference using serverless or dedicated deployments.
Serverless Inference
Synchronous and asynchronous API endpoints for serverless inference.
Send API requests directly to foundation models without creating an AI agent or managing infrastructure.
How to retrieve models available for serverless inference.
Send prompts and use reasoning with the Chat Completions API.
Send prompts with the Responses API.
Use prompt caching with the Chat Completions and Responses API.
Use reasoning with the Chat Completions and Responses API.
Generate or edit images from text prompts.
Generate image, audio, or text-to-speech using fal models.
Convert text into dense vector representations for use in semantic search, retrieval-augmented generation (RAG), clustering, classification, and similarity matching.
View metrics such as latency, throughput, error rates, token consumption, cost attribution, and rate limiting.
Extend model capabilities with server-side tools like knowledge base retrieval and MCP during inference requests.
How to use serverless inference after updating a model.
Manage Model Catalog
Identify the right model for your use case by filtering available foundation models by capabilities and price.
Use Model Playground
Test and compare foundation models in the Model Playground.
Manage Inference Deployments
Deploy open-source and commercial LLMs on dedicated GPUs as an inference endpoint.
Batch inference runs text jobs asynchronously through batch APIs compatible with OpenAI and Anthropic using your serverless inference model access key.
Create, scope, and manage model access keys for foundation models, inference routers, and batch inference, with VPC restrictions and team-owner visibility.
Agentic Workflows
Use the Messages API with Claude Code and similar agentic workflows.