Give Feedback

Inference

Validated on 28 Apr 2026 • Last edited on 11 May 2026

Inference provides a single control plane for managing inference workflows. It includes a Model Catalog where you can view available foundation models, including both DigitalOcean-hosted and third-party commercial models, compare model capabilities and pricing, use routing to match inference requests to the best-fit model, and run inference using serverless or dedicated deployments.

Copy page as Markdown View page as Markdown

Browse Models in Model Catalog

Identify the right model for your use case by filtering available foundation models by capabilities and price.

Use Model Playground

Test and compare foundation models in the Model Playground.

Use Serverless Inference

Send API requests directly to foundation models without creating an AI agent or managing infrastructure.

Deploy to Dedicated Inference Endpoints

Deploy open-source and commercial LLMs on dedicated GPUs as an inference endpoint.

Use Inference Router

Route serverless inference requests to foundation models using rules.

Evaluate Models

Determine which model best fits your specific use case.

Use Batch Inference

Batch Inference lets you run large collections of LLM requests as a single asynchronous job.

Use Agent Platform

Use to build fully-managed AI agents with knowledge bases for retrieval-augmented generation, multi-agent routing, guardrails, and more.

Latest Updates

5 May 2026

The following Moonshot AI model is now available on DigitalOcean Inference for serverless inference, Agent Development Kit and agents:
- Kimi K2.6
For more information, see the Available Models page.

1 May 2026

The following DeepSeek model is now available on DigitalOcean Inference for serverless inference, Agent Development Kit and agents:
- DeepSeek V4 Pro
For more information, see the Available Models page.

28 April 2026

DigitalOcean Knowledge Base retrieval is now available through a DigitalOcean MCP server.
DigitalOcean Inference now supports scoped model access keys. When you create a key, you can limit it to specific foundation models and inference routers, enable batch inference, and restrict it to a VPC network so that only requests from that VPC network can authenticate. Team owners can also view and manage keys created by other team members. Previously created keys continue to authenticate without changes. For more information, see Model Access Keys.
Inference Router in now available in public preview and enabled for all users. Using this feature, you can use multiple models in a model pool to configure routing rules and selection policy for inference requests. We provide pre-built templates or you can define custom task-matching logic using natural language, with configurable fallback support for reliability. For more information, see Inference Router.
As part of the DigitalOcean AI-Native Cloud, DigitalOcean AI Inference Hub is now DigitalOcean Inference.
The following models are now available on DigitalOcean Inference:
- Qwen3 Coder Flash (Alibaba)
- DeepSeek V3.2 (DeepSeek)
- Gemma 4 (Google)
- Llama 4 Maverick 17B 128E Instruct (Meta)
- Ministral 3 14B Instruct (Mistral AI)
- Nemotron Nano 12B v2 VL (NVIDIA)
- Nemotron Nano 3 Omni (NVIDIA)
- BGE M3 (BAAI)
- E5 Large (multilingual) (Intfloat)
- Qwen 3 TTS (1.7B) (text-to-speech)
- Wan2.2-T2V-A14B (text-to-video)
- Stable Diffusion 3.5 Large (image generation)
For more information, see the Models page.
You can now use DigitalOcean personal access tokens for authenticating serverless inference requests. You can use a personal access token as an alternative to a model access key when sending requests to the serverless inference API. Model access keys remain recommended when you need per-application scoping, VPC restriction, or credentials dedicated to inference workloads. For more information, see Serverless Inference Overview.
The Model Playground now supports the following features when testing and comparing models:
- Uploading images from local storage
- Generating multimodal artifacts, such as images, audio, and text-to-speech, from models that support it
Read Test and Compare Models for more information.
We now support multimodal models for serverless inference. Multimodal models process and generate content across multiple data types, including images, audio, video, and text, thus enabling a much broader range of real-world applications, including document intelligence, voice agents, content generation, and accessibility tools. For more information, see Use Multimodal Inference.
You can now evaluate models available for serverless inference, inference routers, and dedicated inference deployments using a judge model. Scoring includes metrics such as correctness, completeness, ground truth faithfulness, and safety metrics. This features is in public preview. You can opt in from the Feature Preview page. For more information, see Evaluate Models.
Model Catalog is now in General Availability.
Bring Your Own Models (BYOM) is now available in Model Catalog. You can import models from Hugging Face or Spaces buckets or folders. For details, see Import a Model.
Batch inference lets you submit text-only batch jobs for OpenAI and Anthropic models. Using batch inference significantly reduces cost compared to real-time inference. For more information, see Use Batch Inference.
You can now browse Model Catalog through a DigitalOcean MCP server.
Dedicated Inference is now in General Availability.
- A remote MCP server is also available, allowing MCP clients to create, update, list, and delete Dedicated Inference endpoints. For more information, see Dedicated Inference MCP Tools.
DigitalOcean Inference now lets you retrieve data from knowledge bases using the Control Panel with semantic, keyword, or hybrid searches, apply filters, review retrieved chunks, and copy live code examples. For more information, see Create and Manage Knowledge Bases.
DigitalOcean Inference now supports reranking for knowledge bases to improve the relevance of retrieved results before they’re returned or used in generated responses. For more information, see Create and Manage Agent Knowledge Bases and Test Reranking.
As part of the DigitalOcean AI-Native Cloud, DigitalOcean Gradient™ AI Platform is now DigitalOcean AI Platform.
RAG Playground is now available in DigitalOcean Inference for DigitalOcean Knowledge Bases. It lets you run queries against a knowledge base and test how a serverless inference model generates answers from retrieved content.

For more information, see the DigitalOcean AI Platform Features page.
Knowledge base enhancements are now generally available in DigitalOcean Inference, including the updated creation workflow, chunking controls, and data retrieval for testing knowledge base. For more information, see Create and Manage Agent Knowledge Bases.
The following embeddings model is now available on DigitalOcean Inference for DigitalOcean Knowledge Bases:
- E5 Large (multilingual) (IntFloat)
- E5 Large (v2) (IntFloat)
- BGE M3 (Beijing Academy of Artificial Intelligence (BAAI))
For more information, see the Available Models page.
The following Beijing Academy of Artificial Intelligence (BAAI) reranking model is now available on DigitalOcean Inference for DigitalOcean Knowledge Bases:
- BGE Reranker (v2) M3
For more information, see the Available Models page.

For more information, see the full release notes.

Inference

Latest Updates

5 May 2026

1 May 2026

28 April 2026

We can't find any results for your search.