Inference Release Notes
Validated on 28 May 2026 • Last edited on 8 May 2026
May 2026
28 May
-
The following Anthropic model is now available on DigitalOcean Inference for serverless inference, Agent Development Kit, and agents:
For more information, see the Available Models page.
27 May
-
The following DeepSeek model is now available on DigitalOcean Inference for serverless inference, dedicated inference, Agent Development Kit, and agents:
For more information, see the Available Models page.
5 May
-
The following Moonshot AI model is now available on DigitalOcean Inference for serverless inference, Agent Development Kit and agents:
For more information, see the Available Models page.
1 May
-
The following DeepSeek model is now available on DigitalOcean Inference for serverless inference, Agent Development Kit and agents:
For more information, see the Available Models page.
April 2026
28 April
-
The following Beijing Academy of Artificial Intelligence (BAAI) reranking model is now available on DigitalOcean Inference for DigitalOcean Knowledge Bases:
For more information, see the Available Models page.
-
The following embeddings model is now available on DigitalOcean Inference for DigitalOcean Knowledge Bases:
- E5 Large (multilingual) (IntFloat)
- E5 Large (v2) (IntFloat)
- BGE M3 (Beijing Academy of Artificial Intelligence (BAAI))
For more information, see the Available Models page.
-
Knowledge base enhancements are now generally available in DigitalOcean Inference, including the updated creation workflow, chunking controls, and data retrieval for testing knowledge base. For more information, see Create and Manage Agent Knowledge Bases.
-
RAG Playground is now available in DigitalOcean Inference for DigitalOcean Knowledge Bases. It lets you run queries against a knowledge base and test how a serverless inference model generates answers from retrieved content.
For more information, see the DigitalOcean AI Platform Features page.
-
As part of the DigitalOcean AI-Native Cloud, DigitalOcean Gradient™ AI Platform is now DigitalOcean AI Platform.
-
DigitalOcean Inference now supports reranking for knowledge bases to improve the relevance of retrieved results before they’re returned or used in generated responses. For more information, see Create and Manage Agent Knowledge Bases and Test Reranking.
-
DigitalOcean Inference now lets you retrieve data from knowledge bases using the Control Panel with semantic, keyword, or hybrid searches, apply filters, review retrieved chunks, and copy live code examples. For more information, see Create and Manage Knowledge Bases.
-
Dedicated Inference is now in General Availability.
- A remote MCP server is also available, allowing MCP clients to create, update, list, and delete Dedicated Inference endpoints. For more information, see Dedicated Inference MCP Tools.
-
You can now browse Model Catalog through a DigitalOcean MCP server.
-
Batch inference lets you submit text-only batch jobs for OpenAI and Anthropic models. Using batch inference significantly reduces cost compared to real-time inference. For more information, see Use Batch Inference.
-
Bring Your Own Models (BYOM) is now available in Model Catalog. You can import models from Hugging Face or Spaces buckets or folders. For details, see Import a Model.
-
Model Catalog is now in General Availability.
-
You can now evaluate models available for serverless inference, inference routers, and dedicated inference deployments using a judge model. Scoring includes metrics such as correctness, completeness, ground truth faithfulness, and safety metrics. This features is in public preview. You can opt in from the Feature Preview page. For more information, see Evaluate Models.
-
We now support multimodal models for serverless inference. Multimodal models process and generate content across multiple data types, including images, audio, video, and text, thus enabling a much broader range of real-world applications, including document intelligence, voice agents, content generation, and accessibility tools. For more information, see Use Multimodal Inference.
-
The Model Playground now supports the following features when testing and comparing models:
-
Uploading images from local storage
-
Generating multimodal artifacts, such as images, audio, and text-to-speech, from models that support it
Read Test and Compare Models for more information.
-
-
You can now use DigitalOcean personal access tokens for authenticating serverless inference requests. You can use a personal access token as an alternative to a model access key when sending requests to the serverless inference API. Model access keys remain recommended when you need per-application scoping, VPC restriction, or credentials dedicated to inference workloads. For more information, see Serverless Inference Overview.
-
The following models are now available on DigitalOcean Inference:
- Qwen3 Coder Flash (Alibaba)
- DeepSeek V3.2 (DeepSeek)
- Gemma 4 (Google)
- Llama 4 Maverick 17B 128E Instruct (Meta)
- Ministral 3 14B Instruct (Mistral AI)
- Nemotron Nano 12B v2 VL (NVIDIA)
- Nemotron Nano 3 Omni (NVIDIA)
- BGE M3 (BAAI)
- E5 Large (multilingual) (Intfloat)
- Qwen 3 TTS (1.7B) (text-to-speech)
- Wan2.2-T2V-A14B (text-to-video)
- Stable Diffusion 3.5 Large (image generation)
For more information, see the Models page.
-
As part of the DigitalOcean AI-Native Cloud, DigitalOcean AI Inference Hub is now DigitalOcean Inference.
-
Inference Router in now available in public preview and enabled for all users. Using this feature, you can use multiple models in a model pool to configure routing rules and selection policy for inference requests. We provide pre-built templates or you can define custom task-matching logic using natural language, with configurable fallback support for reliability. For more information, see Inference Router.
-
DigitalOcean Inference now supports scoped model access keys. When you create a key, you can limit it to specific foundation models and inference routers, enable batch inference, and restrict it to a VPC network so that only requests from that VPC network can authenticate. Team owners can also view and manage keys created by other team members. Previously created keys continue to authenticate without changes. For more information, see Model Access Keys.
-
DigitalOcean Knowledge Base retrieval is now available through a DigitalOcean MCP server.
27 April
-
The following OpenAI model is now available on DigitalOcean AI Platform for Agent Development Kit and agents:
For more information, see the Available Models page.
-
The following OpenAI model is now available on Inference for serverless inference:
For more information, see the Available Models page.
23 April
-
The following OpenAI model is now available on DigitalOcean Gradient™ AI Platform for serverless inference and Agent Development Kit:
For more information, see the Available Models page.
-
The following OpenAI model is now available on DigitalOcean AI Inference Hub for serverless inference:
For more information, see the Available Models page.
22 April
-
DigitalOcean Gradient™ AI Platform is now DigitalOcean AI Platform.
16 April
-
The following Anthropic model is now available on DigitalOcean Gradient™ AI Platform for serverless inference, Agent Development Kit, and agents:
For more information, see the Available Models page.
-
The following Anthropic model is now available on DigitalOcean AI Inference Hub for serverless inference:
For more information, see the Available Models page.
3 April
-
The following models are deprecated from DigitalOcean Gradient™ AI Platform:
- Meta Llama 3.1 8B-Instruct
- Mistral NeMo
Migrate to a supported active model to avoid service disruption. For information on our model deprecation policy and recommended replacement models, see Model Support Policy.
-
The following models are deprecated from the Model Catalog:
- Meta Llama 3.1 8B-Instruct
- Mistral NeMo
Migrate to Llama 3.3 70B-Instruct (
llama3.3-70b-instruct) and gpt-oss-20b (openai-gpt-oss-20b) models respectively, to avoid service disruption.
2 April
-
The following client libraries for DigitalOcean AI Inference Hub are now available in the official DigitalOcean SDKs. You can use the SDKs to manage serverless and dedicated inference:
-
The Python client library is now available in the official DigitalOcean Python client library PyDo. For more information, see the following reference documentation:
-
The TypeScript client library is now available in the official DigitalOcean TypeScript library DoTs.
-
The official Go client library is available at Gradient Go library.
The Gradient™ SDK will be deprecated in a future release.
-
-
The following client libraries for DigitalOcean Gradient™ AI Platform are now available in the official DigitalOcean SDKs. You can use the SDKs to manage DigitalOcean Gradient™ AI Platform resources, including knowledge bases and generative AI agents, and agent, serverless, and dedicated inference:
-
The Python client library is now available in the official DigitalOcean Python client library PyDo. For more information, see the following reference documentation:
-
The TypeScript client library is now available in the official DigitalOcean TypeScript library DoTs.
-
The official Go client library is available at Gradient Go library.
The Gradient™ SDK will be deprecated in a future release.
-
1 April
-
The following Arcee model is now available on DigitalOcean Gradient™ AI Platform for serverless inference and Agent Development Kit:
- Trinity Large (Public Preview)
For more information, see the Available Models page.
-
The following Arcee model is now available on DigitalOcean AI Inference Hub for serverless inference:
- Trinity Large (Public Preview)
For more information, see the Available Models page.
March 2026
27 March
-
The following OpenAI models are now available on DigitalOcean Gradient™ AI Platform for serverless inference and Agent Development Kit:
For more information, see the Available Models page.
-
The following OpenAI models are now available on Inference for serverless inference:
17 March
-
The following NVIDIA model is now available on DigitalOcean Gradient™ AI Platform for serverless inference, Agent Development Kit, and agents:
- Nemotron-3-Super-120B (Public Preview)
For more information, see the Available Models page.
-
The following NVIDIA model is now available on DigitalOcean AI Inference Hub for serverless inference:
- Nemotron-3-Super-120B (Public Preview)
For more information, see the Available Models page.
16 March
-
Gradient AI Dedicated Inference Service is a managed LLM hosting service for optimized inference on dedicated GPUs, now available in public preview and enabled for all users. For more information, see Use Dedicated Inference.
-
DigitalOcean AI Inference Hub is now available in private preview and is enabled for all users. Inference Hub provides access to a catalog of foundation models with support for serverless inference and dedicated inference, along with a Model Playground for testing models before deployment.
During the private preview period, features and model availability may change.
12 March
-
The following models are now available on DigitalOcean Gradient™ AI Platform for serverless inference, Agent Development Kit, and creating agents:
-
MiniMax M2.5 (Public Preview)
For more information, see the Available Models page.
6 March
-
The following OpenAI model is now available on DigitalOcean Gradient™ AI Platform for serverless inference and Agent Development Kit:
For more information, see the Available Models page.
February 2026
27 February
-
We now support prompt caching for the following OpenAI models in serverless inference chat completions and responses API:
- GPT-5.3-Codex
- GPT-5.2
- GPT-5.2 pro
- GPT-5.1-Codex-Max
- GPT-5
- GPT-5 mini
- GPT-5 nano
- GPT-4.1
- GPT-4o
- GPT-4o mini
- o1
- o3
- o3-mini
- GPT-image-1
25 February
-
The following Anthropic model is deprecated from DigitalOcean Gradient™ AI Platform:
Migrate to a supported active model to avoid service disruption. For information on our model deprecation policy and recommended replacement models, see Model Support Policy.
-
The following OpenAI models are now available on DigitalOcean Gradient™ AI Platform for serverless inference and Agent Development Kit:
For more information, see the Available Models page.
24 February
-
You can now use reasoning with serverless inference chat completion. For more information, see Use Reasoning.
19 February
-
The following Anthropic models are deprecated from DigitalOcean Gradient™ AI Platform:
Migrate to a supported active model to avoid service disruption. For information on our model deprecation policy and recommended replacement models, see Model Support Policy.
-
OpenAI and Anthropic commercial models now default to using DigitalOcean API keys when creating new agents. This allows you to have consolidated billing for all agent usage and no keys to manage on your own. If you want, you can bring your own keys when creating new agents or continue using your own keys for existing agents. For more information on pricing, see the pricing details.
17 February
-
The following Anthropic model is now available on DigitalOcean Gradient™ AI Platform for serverless inference, Agent Development Kit, and creating agents:
For more information, see the Available Models page.
9 February
-
Multimodal models for image and audio generation, provided by fal, are now in general availability. You can use these models for serverless inference. For examples of how to use these models, see Generate Image, Audio, or Text-to-Speech Using fal Models.
5 February
-
End users and agent developers can now provide feedback on the quality and helpfulness of agent responses. The feedback is collected through the chatbot interface, agent playground, and log stream traces and stored in the traces. For more information, see Provide Agent Feedback and View Conversation Logs, Traces, and Insights.
-
The following Anthropic and OpenAI models are now available on DigitalOcean Gradient™ AI Platform:
-
Haiku 4.5 for serverless inference, ADK, and agent creation
-
GPT-5.2 pro for serverless inference, ADK, and agent creation
-
Opus 4.6 for serverless inference and ADK
For more information, see the Available Models page.
-
-
We have enabled trace storage by default for both newly created and existing agents.
January 2026
30 January
-
We now support prompt caching for the following Anthropic models:
- Claude Sonnet 4.5
- Claude Sonnet 4
- Claude Opus 4.5
- Claude Opus 4.1
- Claude Opus 4
Using prompt caching with serverless inference chat completion significantly reduces the cost for inference.
-
The following OpenAI models are now available on DigitalOcean Gradient™ AI Platform for serverless inference, Agent Development Kit, and creating agents:
For more information, see the Available Models page.
December 2025
18 December
-
The following models are now available on DigitalOcean Gradient™ AI Platform for serverless inference and Agent Development Kit:
For more information, see the Available Models page.
17 December
-
Knowledge base chunking and retrieval are now in public preview. Chunking splits your documents into smaller units before indexing, and existing data sources default to section-based chunking.
You can update chunking settings in the control panel or via the API, and retrieve recent chunks using the retrieval endpoint. For more details, see the API reference and chunking best practices.
-
The knowledge base creation experience has been enhanced and is in public preview. You can opt in from knowledge base enhancements feature preview.
11 December
-
DigitalOcean Gradient™ AI Platform also now supports the Qwen3 Embedding 0.6B (Multilingual) model from Alibaba Qwen in public preview. Select this model for multilingual indexing workflows. Learn more on the embeddings models page.
9 December
-
The Agent Development Kit (ADK) is now available in public preview. For more information, see Use ADK to Build, Test, and Deploy Agents.
October 2025
29 October
-
We support specifying a sitemap URL as a data source for your knowledge base. For more information, see Select Your Data Sources.
20 October
-
New multimodal models for image and audio generation, provided by fal, are now available for serverless inference only. These models are in public preview.
1 October
-
You can now use auto-indexing for knowledge bases. Auto-indexing keeps your knowledge base up-to-date by automatically re-indexing new and updated files from connected sources.
-
The OpenAI GPT-image-1 model is now available on DigitalOcean Gradient™ AI Platform. See all available models.
September 2025
25 September
-
You can now view activity logs for your knowledge bases in the DigitalOcean Control Panel. The Activity tab shows the 15 most recent indexing jobs, and includes details such as status, number of files or URLs processed, skipped, or failed, token usage, and charges. You can also download CSV files for more detailed debugging.
9 September
-
GPT-5 mini model is now available on DigitalOcean Gradient™ AI Platform. For more information, see the Available Models page.
-
GPT-5 nano model is now available on DigitalOcean Gradient™ AI Platform. For more information, see the Available Models page.
August 2025
18 August
-
The OpenAI gpt-oss-120b and gpt-oss-20b models are available on DigitalOcean Gradient™ AI Platform. See all available models.
12 August
-
The OpenAI o1 model is now available on DigitalOcean Gradient™ AI Platform. See all available models.
11 August
-
The OpenAI GPT-5 model is now available on DigitalOcean Gradient™ AI Platform. See all available models.
5 August
-
GPT-4.1 model is now available on DigitalOcean Gradient™ AI Platform. For more information, see the Available Models page.
July 2025
27 July
-
Claude Opus 4 model is now available on DigitalOcean Gradient™ AI Platform. For more information, see the Available Models page.
-
Claude Sonnet 4 model is now available on DigitalOcean Gradient™ AI Platform. For more information, see the Available Models page.
25 July
-
You can now add a Dropbox folder as a data source to your knowledge bases. This allows you to index and use files stored in your Dropbox account within your knowledge base.
22 July
-
DigitalOcean Gradient™ AI Platform now offers log stream insights, which provide data-driven recommendations to help improve agent efficiency and accuracy by analyzing your agent’s historical trace data. For details, see View Traces, Conversation Logs, and Insights.
-
The official DigitalOcean Gradient™ AI Platform SDK is now in public preview. You can use the SDK to manage DigitalOcean Gradient™ AI Platform resources, including knowledge bases and generative AI agents, from Python applications.
11 July
-
The Alibaba Qwen3-32B model is now available for serverless inference on DigitalOcean Gradient™ AI Platform. See all available models.
9 July
-
As part of the DigitalOcean AI Agentic Cloud, GenAI Platform is now DigitalOcean Gradient™ AI Platform.
-
Support for Amazon S3 buckets as data sources for DigitalOcean Gradient™ AI Platform knowledge bases is now in public preview.
-
DigitalOcean Gradient™ AI Platform is now in general availability.
3 July
-
o3 model is now available on DigitalOcean Gradient™ AI Platform. For more information, see the Available Models page.
2 July
-
Agent tracing and conversation logs are now in public preview for DigitalOcean Gradient™ AI Platform. This allows you to review how your agents process prompts, including input and output content, tool calls, knowledge base retrievals, and processing times.
June 2025
5 June
-
Serverless inference is now available on DigitalOcean Gradient™ AI Platform. Serverless inference lets you to get direct responses from foundation models using a single API endpoint without creating an agent.
April 2025
29 April
-
You can now view token usage and performance metrics for DigitalOcean Gradient™ AI Platform agents.
28 April
-
You can now rollback to a previous version of DigitalOcean Gradient™ AI Platform agents.
23 April
-
You can now create DigitalOcean Gradient™ AI Platform agents from templates which have predefined agent instructions and foundation models.
22 April
-
You can now view the knowledge bases, functions, and guardrails that DigitalOcean Gradient™ AI Platform agents use to generate a response in the Agent Playground and the agent API.
-
You can now view the runtime logs for agents on DigitalOcean Gradient™ AI Platform. The logs display the events that occur during an agent’s execution, such as the knowledge bases and functions accessed to generate a response.
16 April
-
Claude Sonnet 3.7 model is now available on DigitalOcean Gradient™ AI Platform. For more information, see the Available Models page.
9 April
-
You can now use OpenAI models and test them in the Model Playground on DigitalOcean Gradient™ AI Platform.
March 2025
25 March
-
We have raised the number of URLs that DigitalOcean Gradient™ AI Platform’s web crawler data source can crawl from 1,000 to 5,500.
February 2025
21 February
-
You can now add a website as a data source for DigitalOcean Gradient™ AI Platform knowledge bases.
-
We have restored access to the Sensitive Data Detection guardrail for DigitalOcean Gradient™ AI Platform.
7 February
-
You can now use the DeepSeek-R1 model with agents on DigitalOcean Gradient™ AI Platform.
January 2025
31 January
-
You can now use Anthropic models with agents on DigitalOcean Gradient™ AI Platform.
28 January
-
You can add files from your local storage as data source for your knowledge base.
24 January
-
The Sensitive Data Detection guardrail and its custom versions have been temporarily removed. To identify and anonymize sensitive data, add the following to your agent instructions:
You must avoid providing responses containing sensitive or private information. Sensitive information includes but is not limited to: * Personal data (e.g., names, addresses, emails, phone numbers) * Financial details (e.g., credit card numbers, bank accounts) * Medical information * Private communications * Confidential business information If the user's query involves sensitive information, respond with: "I'm sorry, I can't answer that."
22 January
-
DigitalOcean Gradient™ AI Platform is in public preview.