Inference Release Notes
Validated on 1 May 2026
May 2026
1 May
-
The following DeepSeek model is now available on DigitalOcean Inference for serverless inference:
For more information, see the Available Models page.
April 2026
28 April
-
Dedicated Inference is now in General Availability.
- A remote MCP server is also available, allowing MCP clients to create, update, list, and delete Dedicated Inference endpoints. For more information, see Dedicated Inference MCP Tools.
-
You can now browse Model Catalog through a DigitalOcean MCP server.
-
Batch inference lets you submit text-only batch jobs for OpenAI and Anthropic models. Using batch inference significantly reduces cost compared to real-time inference. For more information, see Use Batch Inference.
-
The following Google model is now available on DigitalOcean Inference for serverless inference:
For more information, see the Available Models page.
-
Bring Your Own Models (BYOM) is now available in Model Catalog. You can import models from Hugging Face or Spaces buckets or folders. For details, see Import a Model.
-
Model Catalog is now in General Availability.
-
You can now evaluate models available for serverless inference, inference routers, and dedicated inference deployments using a judge model. Scoring includes metrics such as correctness, completeness, ground truth faithfulness, and safety metrics. This features is in public preview. You can opt in from the Feature Preview page. For more information, see Evaluate Models.
-
We now support multimodal models for serverless inference. Multimodal models process and generate content across multiple data types, including images, audio, video, and text, thus enabling a much broader range of real-world applications, including document intelligence, voice agents, content generation, and accessibility tools. For more information, see Use Multimodal Inference.
-
The Model Playground now supports the following features when testing and comparing models:
-
Uploading images from local storage
-
Generating multimodal artifacts, such as images, audio, and text-to-speech, from models that support it
Read Test and Compare Models for more information.
-
-
The following NVIDIA model is now available on Inference for serverless inference:
For more information, see the Available Models page.
-
You can now use DigitalOcean personal access tokens for authenticating serverless inference requests. You can use a personal access token as an alternative to a model access key when sending requests to the serverless inference API. Model access keys remain recommended when you need per-application scoping, VPC restriction, or credentials dedicated to inference workloads. For more information, see Serverless Inference Overview.
-
The following models are now available on DigitalOcean AI Inference for serverless inference:
- Qwen3 Coder Flash (Alibaba)
- DeepSeek V3.2 (DeepSeek)
- Llama 4 Maverick 17B 128E Instruct (Meta)
- Ministral 3 14B Instruct (Mistral AI)
- Nemotron Nano 12B v2 VL (NVIDIA)
- BGE M3 (BAAI)
- E5 Large (multilingual) (Intfloat)
- Qwen 3 TTS (1.7B) (text-to-speech)
- Wan2.2-T2V-A14B (text-to-video)
- Stable Diffusion 3.5 Large (image generation)
For more information, see the Foundation models page.
-
As part of the DigitalOcean AI-Native Cloud, DigitalOcean AI Inference Hub is now Inference.
-
Inference Router in now available in public preview and enabled for all users. Using this feature, you can use multiple models in a model pool to configure routing rules and selection policy for inference requests. We provide pre-built templates or you can define custom task-matching logic using natural language, with configurable fallback support for reliability. For more information, see Inference Router.
-
DigitalOcean AI Inference now supports scoped model access keys. When you create a key, you can limit it to specific foundation models and inference routers, enable batch inference, and restrict it to a VPC network so that only requests from that VPC network can authenticate. Team owners can also view and manage keys created by other team members. Previously created keys continue to authenticate without changes. For more information, see Model Access Keys.
27 April
-
The following OpenAI model is now available on Inference for serverless inference:
For more information, see the Available Models page.
23 April
-
The following OpenAI model is now available on DigitalOcean AI Inference Hub for serverless inference:
For more information, see the Available Models page.
16 April
-
The following Anthropic model is now available on DigitalOcean AI Inference Hub for serverless inference:
For more information, see the Available Models page.
3 April
-
The following models are deprecated from the Model Catalog:
- Meta Llama 3.1 8B-Instruct
- Mistral NeMo
Migrate to Llama 3.3 70B-Instruct (
llama3.3-70b-instruct) and gpt-oss-20b (openai-gpt-oss-20b) models respectively, to avoid service disruption.
2 April
-
The following client libraries for DigitalOcean AI Inference Hub are now available in the official DigitalOcean SDKs. You can use the SDKs to manage serverless and dedicated inference:
-
The Python client library is now available in the official DigitalOcean Python client library PyDo. For more information, see the following reference documentation:
-
The TypeScript client library is now available in the official DigitalOcean TypeScript library DoTs.
-
The official Go client library is available at Gradient Go library.
The Gradient™ SDK will be deprecated in a future release.
-
1 April
-
The following Arcee model is now available on DigitalOcean AI Inference Hub for serverless inference:
- Trinity Large (Public Preview)
For more information, see the Available Models page.
March 2026
27 March
-
The following OpenAI models are now available on Inference for serverless inference:
17 March
-
The following NVIDIA model is now available on DigitalOcean AI Inference Hub for serverless inference:
- Nemotron-3-Super-120B (Public Preview)
For more information, see the Available Models page.
16 March
-
DigitalOcean AI Inference Hub is now available in private preview and is enabled for all users. Inference Hub provides access to a catalog of foundation models with support for serverless inference and dedicated inference, along with a Model Playground for testing models before deployment.
During the private preview period, features and model availability may change.