What is serverless inference and how it differs from dedicated inference.
Inference How-Tos
Generated on 15 May 2026
Inference provides a single control plane for managing inference workflows. It includes a Model Catalog where you can view available foundation models, including both DigitalOcean-hosted and third-party commercial models, compare model capabilities and pricing, use routing to match inference requests to the best-fit model, and run inference using serverless or dedicated deployments.
Serverless Inference
Synchronous and asynchronous API endpoints for serverless inference.
Send API requests directly to foundation models without creating an AI agent or managing infrastructure.
How to retrieve models available for serverless inference.
Send prompts and use reasoning with the Chat Completions API.
Send prompts with the Responses API.
Use prompt caching with the Chat Completions and Responses API.
Use reasoning with the Chat Completions and Responses API.
Generate or edit images from text prompts.
Process and generate content across multiple data types, including images, audio, video, and text using multimodal models.
Generate image, audio, or text-to-speech using fal models.
Convert text into dense vector representations for use in semantic search, retrieval-augmented generation (RAG), clustering, classification, and similarity matching.
View metrics such as latency, throughput, error rates, token consumption, cost attribution, and rate limiting.
How to use serverless inference after updating a model.
Manage Model Catalog
Identify the right model for your use case by filtering available foundation models by capabilities and price.
Import Bring Your Own Models (BYOM) models into Model Catalog from Hugging Face or Spaces buckets and folders.
Test and compare foundation models in the Model Playground.
Use Dedicated Inference
Deploy open-source and commercial LLMs on dedicated GPUs as an inference endpoint.
Use Batch Inference
Batch inference runs text jobs asynchronously through batch APIs compatible with OpenAI and Anthropic using your serverless inference model access key.
Use Inference Router
Create and configure an Inference Router to route inference requests to foundation models.
Agentic Workflows
Use the Messages API with Claude Code and similar agentic workflows.
Evaluate Models
Use Built-in Tools
Extend model capabilities with server-side tools like knowledge base retrieval and MCP during inference requests with serverless and dedicated inference.
Use Agent Platform
Build fully-managed AI agents with knowledge bases for retrieval-augmented generation, multi-agent routing, guardrails, and more.
Create an agent with domain-specific knowledge to provide information or take action.
Use Agent Development Kit to create and manage agents.
Add, edit, or delete API keys to use their models with your agents.
Create workspaces to group agents together and move agents between workspaces as needed.
Use your agent in an application or through a chat bot interface.
Test the full agent experience with the directions and features you’ve configured using the Agent Playground.
Create test cases and measure how well your agents perform against for things like tone, factual accuracy, and context relevance.
Create effective evaluation datasets in CSV format to test your agent’s performance, improve accuracy, and measure qualities like factual correctness, safety, and instruction following.
View insights and runtime logs for your agents to troubleshoot issues.
Trace how your agent processes prompts to troubleshoot issues, improve performance, and control costs.
Integrate multiple generative AI agents.
Enable the foundation model in your agent to access the external data sources using functions.
Rollback to a previous version of an agent to undo changes made to it.
Create, edit, manage data sources, verify, and permanently destroy knowledge bases.
Attach or detach a knowledge base from your agents.
Create, manage, edit, duplicate, or delete guardrails to control how your agents respond to sensitive or inappropriate content.
Destroy an agent to permanently and irreversibly destroy the agent and removes all endpoints for the agent.
Test how foundation models answer questions using content retrieved from a knowledge base.
Manage Model Access Keys
Create, scope, and manage model access keys for foundation models, inference routers, and batch inference, with VPC restrictions and team-owner visibility.