Inference Release Notes

Validated on 28 May 2026 • Last edited on 8 May 2026

May 2026

28 May

27 May

5 May

1 May

April 2026

28 April

  • The following Beijing Academy of Artificial Intelligence (BAAI) reranking model is now available on DigitalOcean Inference for DigitalOcean Knowledge Bases:

    For more information, see the Available Models page.

  • The following embeddings model is now available on DigitalOcean Inference for DigitalOcean Knowledge Bases:

    For more information, see the Available Models page.

  • Knowledge base enhancements are now generally available in DigitalOcean Inference, including the updated creation workflow, chunking controls, and data retrieval for testing knowledge base. For more information, see Create and Manage Agent Knowledge Bases.

  • RAG Playground is now available in DigitalOcean Inference for DigitalOcean Knowledge Bases. It lets you run queries against a knowledge base and test how a serverless inference model generates answers from retrieved content.

    For more information, see the DigitalOcean AI Platform Features page.

  • As part of the DigitalOcean AI-Native Cloud, DigitalOcean Gradient™ AI Platform is now DigitalOcean AI Platform.

  • DigitalOcean Inference now supports reranking for knowledge bases to improve the relevance of retrieved results before they’re returned or used in generated responses. For more information, see Create and Manage Agent Knowledge Bases and Test Reranking.

  • DigitalOcean Inference now lets you retrieve data from knowledge bases using the Control Panel with semantic, keyword, or hybrid searches, apply filters, review retrieved chunks, and copy live code examples. For more information, see Create and Manage Knowledge Bases.

  • Dedicated Inference is now in General Availability.

  • You can now browse Model Catalog through a DigitalOcean MCP server.

  • Batch inference lets you submit text-only batch jobs for OpenAI and Anthropic models. Using batch inference significantly reduces cost compared to real-time inference. For more information, see Use Batch Inference.

  • Bring Your Own Models (BYOM) is now available in Model Catalog. You can import models from Hugging Face or Spaces buckets or folders. For details, see Import a Model.

  • Model Catalog is now in General Availability.

  • You can now evaluate models available for serverless inference, inference routers, and dedicated inference deployments using a judge model. Scoring includes metrics such as correctness, completeness, ground truth faithfulness, and safety metrics. This features is in public preview. You can opt in from the Feature Preview page. For more information, see Evaluate Models.

  • We now support multimodal models for serverless inference. Multimodal models process and generate content across multiple data types, including images, audio, video, and text, thus enabling a much broader range of real-world applications, including document intelligence, voice agents, content generation, and accessibility tools. For more information, see Use Multimodal Inference.

  • The Model Playground now supports the following features when testing and comparing models:

    • Uploading images from local storage

    • Generating multimodal artifacts, such as images, audio, and text-to-speech, from models that support it

    Read Test and Compare Models for more information.

  • You can now use DigitalOcean personal access tokens for authenticating serverless inference requests. You can use a personal access token as an alternative to a model access key when sending requests to the serverless inference API. Model access keys remain recommended when you need per-application scoping, VPC restriction, or credentials dedicated to inference workloads. For more information, see Serverless Inference Overview.

  • The following models are now available on DigitalOcean Inference:

    For more information, see the Models page.

  • As part of the DigitalOcean AI-Native Cloud, DigitalOcean AI Inference Hub is now DigitalOcean Inference.

  • Inference Router in now available in public preview and enabled for all users. Using this feature, you can use multiple models in a model pool to configure routing rules and selection policy for inference requests. We provide pre-built templates or you can define custom task-matching logic using natural language, with configurable fallback support for reliability. For more information, see Inference Router.

  • DigitalOcean Inference now supports scoped model access keys. When you create a key, you can limit it to specific foundation models and inference routers, enable batch inference, and restrict it to a VPC network so that only requests from that VPC network can authenticate. Team owners can also view and manage keys created by other team members. Previously created keys continue to authenticate without changes. For more information, see Model Access Keys.

  • DigitalOcean Knowledge Base retrieval is now available through a DigitalOcean MCP server.

27 April

23 April

22 April

  • DigitalOcean Gradient™ AI Platform is now DigitalOcean AI Platform.

16 April

3 April

  • The following models are deprecated from DigitalOcean Gradient™ AI Platform:

    • Meta Llama 3.1 8B-Instruct
    • Mistral NeMo

    Migrate to a supported active model to avoid service disruption. For information on our model deprecation policy and recommended replacement models, see Model Support Policy.

  • The following models are deprecated from the Model Catalog:

    • Meta Llama 3.1 8B-Instruct
    • Mistral NeMo

    Migrate to Llama 3.3 70B-Instruct (llama3.3-70b-instruct) and gpt-oss-20b (openai-gpt-oss-20b) models respectively, to avoid service disruption.

2 April

1 April

March 2026

27 March

17 March

16 March

12 March

6 March

February 2026

27 February

  • We now support prompt caching for the following OpenAI models in serverless inference chat completions and responses API:

    • GPT-5.3-Codex
    • GPT-5.2
    • GPT-5.2 pro
    • GPT-5.1-Codex-Max
    • GPT-5
    • GPT-5 mini
    • GPT-5 nano
    • GPT-4.1
    • GPT-4o
    • GPT-4o mini
    • o1
    • o3
    • o3-mini
    • GPT-image-1

25 February

24 February

  • You can now use reasoning with serverless inference chat completion. For more information, see Use Reasoning.

19 February

17 February

9 February

5 February

January 2026

30 January

December 2025

18 December

17 December

11 December

  • DigitalOcean Gradient™ AI Platform also now supports the Qwen3 Embedding 0.6B (Multilingual) model from Alibaba Qwen in public preview. Select this model for multilingual indexing workflows. Learn more on the embeddings models page.

9 December

October 2025

29 October

  • We support specifying a sitemap URL as a data source for your knowledge base. For more information, see Select Your Data Sources.

20 October

1 October

September 2025

25 September

  • You can now view activity logs for your knowledge bases in the DigitalOcean Control Panel. The Activity tab shows the 15 most recent indexing jobs, and includes details such as status, number of files or URLs processed, skipped, or failed, token usage, and charges. You can also download CSV files for more detailed debugging.

9 September

August 2025

18 August

12 August

11 August

5 August

July 2025

27 July

25 July

22 July

11 July

9 July

3 July

2 July

  • Agent tracing and conversation logs are now in public preview for DigitalOcean Gradient™ AI Platform. This allows you to review how your agents process prompts, including input and output content, tool calls, knowledge base retrievals, and processing times.

June 2025

5 June

  • Serverless inference is now available on DigitalOcean Gradient™ AI Platform. Serverless inference lets you to get direct responses from foundation models using a single API endpoint without creating an agent.

April 2025

29 April

28 April

23 April

22 April

  • You can now view the knowledge bases, functions, and guardrails that DigitalOcean Gradient™ AI Platform agents use to generate a response in the Agent Playground and the agent API.

  • You can now view the runtime logs for agents on DigitalOcean Gradient™ AI Platform. The logs display the events that occur during an agent’s execution, such as the knowledge bases and functions accessed to generate a response.

16 April

9 April

March 2025

25 March

February 2025

21 February

7 February

  • You can now use the DeepSeek-R1 model with agents on DigitalOcean Gradient™ AI Platform.

January 2025

31 January

  • You can now use Anthropic models with agents on DigitalOcean Gradient™ AI Platform.

28 January

24 January

  • The Sensitive Data Detection guardrail and its custom versions have been temporarily removed. To identify and anonymize sensitive data, add the following to your agent instructions:

    You must avoid providing responses containing sensitive or private information.
    
    Sensitive information includes but is not limited to:
    
    * Personal data (e.g., names, addresses, emails, phone numbers)
    * Financial details (e.g., credit card numbers, bank accounts)
    * Medical information
    * Private communications
    * Confidential business information
    
    If the user's query involves sensitive information, respond with: "I'm sorry, I can't answer that."

22 January

We can't find any results for your search.

Try using different keywords or simplifying your search terms.