Identify the right model for your use case by filtering available foundation models by capabilities and price.
DigitalOcean AI Inference
Validated on 27 Apr 2026 • Last edited on 27 Apr 2026
Inference provides a single control plane for managing inference workflows. It includes a Model Catalog where you can view available foundation models, including both DigitalOcean-hosted and third-party commercial models, compare model capabilities and pricing, use routing to match inference requests to the best-fit model, and run inference using serverless or dedicated deployments.
Test and compare foundation models in the Model Playground.
Determine which model best fits your specific use case.
Send API requests directly to foundation models without creating an AI agent or managing infrastructure.
Deploy open-source and commercial LLMs on dedicated GPUs as an inference endpoint.
Batch Inference lets you run large collections of LLM requests as a single asynchronous job.
Use to build fully-managed AI agents with knowledge bases for retrieval-augmented generation, multi-agent routing, guardrails, and more.
Latest Updates
28 April 2026
-
DigitalOcean AI Inference now supports scoped model access keys. When you create a key, you can limit it to specific foundation models, enable batch inference, and restrict it to a VPC network so that only requests from that VPC network can authenticate. Team owners can also view and manage keys created by other team members. Previously created keys continue to authenticate without changes. For more information, see Model Access Keys.
-
As part of the DigitalOcean AI Native Cloud, DigitalOcean AI Inference Hub is now Inference.
-
You can now use DigitalOcean personal access tokens for authenticating serverless inference requests. You can use a personal access token as an alternative to a model access key when sending requests to the serverless inference API. Model access keys remain recommended when you need per-application scoping, VPC restriction, or credentials dedicated to inference workloads. For more information, see Serverless Inference Overview.
-
The Model Playground now supports the following features when testing and comparing models:
-
Uploading images from local storage
-
Generating multimodal artifacts, such as images, audio, and text-to-speech, from models that support it
Read Test and Compare Models for more information.
-
-
You can now evaluate models available for serverless inference, and dedicated inference deployments using a judge model. Scoring includes metrics such as correctness, completeness, ground truth faithfulness, and safety metrics. This features is in public preview. You can opt in from the Feature Preview page. For more information, see Evaluate Models.
-
Model Catalog is now in General Availability.
-
Batch inference lets you submit text-only batch jobs for OpenAI and Anthropic models. Using batch inference significantly reduces cost compared to real-time inference. For more information, see Use Batch Inference.
-
You can now browse Model Catalog through a DigitalOcean MCP server.
-
Dedicated Inference is now in General Availability.
- A remote MCP server is also available, allowing MCP clients to create, update, list, and delete Dedicated Inference endpoints. For more information, see Dedicated Inference MCP Tools.
27 April 2026
-
The following OpenAI model is now available on Inference for serverless inference:
For more information, see the Available Models page.
23 April 2026
-
The following OpenAI model is now available on DigitalOcean AI Inference Hub for serverless inference:
For more information, see the Available Models page.
For more information, see the full release notes.