# Available Foundation and Embedding Models for DigitalOcean Gradient™ AI Platform

DigitalOcean Gradient™ AI Platform lets you build fully-managed AI agents with knowledge bases for retrieval-augmented generation, multi-agent routing, guardrails, and more, or use serverless inference to make direct requests to popular foundation models.

The following foundation and embedding models are available for Gradient AI Platform. For pricing, see [Gradient AI Platform’s pricing page](https://docs.digitalocean.com/products/gradient-ai-platform/details/pricing/index.html.md).

## Foundation Models

Gradient AI Platform supports both open source and commercial foundation models. You can use these models for:

- [Serverless inference](https://docs.digitalocean.com/products/gradient-ai-platform/how-to/use-serverless-inference/index.html.md)
- [Building agents using the control panel, CLI, or API](https://docs.digitalocean.com/products/gradient-ai-platform/how-to/create-agents/index.html.md)
- [Building and deploying agents using the Agent Development Kit (ADK)](https://docs.digitalocean.com/products/gradient-ai-platform/how-to/build-agents-using-adk/index.html.md)
- [Testing configurations in the Agent Playground](https://docs.digitalocean.com/products/gradient-ai-platform/how-to/test-agents/index.html.md)

*Open source models* are generally published by research labs, available under open licenses. *Commercial models* are proprietary such as OpenAI and Anthropic models. All models are offered using DigitalOcean API access keys, but you can also bring your own provider’s API keys to access the commercial models.

We regularly update our model offerings to provide the most performant and efficient models, and deprecate older models. For information on our model deprecation policy and recommended model replacements, see [Model Support Policy](https://docs.digitalocean.com/products/gradient-ai-platform/details/model-support-policy/index.html.md).

We offer the following foundation models, subject to the [AI Model Terms](https://www.digitalocean.com/legal/tos-service-specific-terms#4-ai-model-terms), our [Service Terms](https://www.digitalocean.com/legal/tos-service-specific-terms), and the [Terms of Service Agreement](https://www.digitalocean.com/legal/terms-of-service-agreement):

## Anthropic Models

Anthropic models available on the Gradient AI Platform support [tool (function) calling](https://docs.digitalocean.com/products/gradient-ai-platform/details/features/index.html.md#tool-function-calling), [prompt caching](https://docs.digitalocean.com/products/gradient-ai-platform/details/features/index.html.md#prompt-caching), and other features. See the usage notes in the following table for details. Refer to the provider documentation for other supported features.

| Model | Model ID | Max Output Tokens | Use for | Usage Notes | Tentative End-of-Support | |
|---|---|---|---|---|---|---|
| [Claude Sonnet 4.6](https://www.anthropic.com/claude/sonnet) | `anthropic-claude-4.6-sonnet` | 64,000 | ✔️ Serverless inference, ✔️ ADK, ✔️ Agents | ✔️ Input context window of up to 1M tokens, ✔️ Prompt caching, ✔️ Tool (function) calling | No sooner than February 2027 | |
| [Claude Sonnet 4.5](https://www.anthropic.com/claude/sonnet) | `anthropic-claude-4.5-sonnet` | 64,000 | ✔️ Serverless inference, ✔️ ADK, ✔️ Agents | ✔️ Input context window of up to 1M tokens, ✔️ Prompt caching, ✔️ Tool calling | No sooner than September 2026 | |
| [Claude Sonnet 4](https://www.anthropic.com/claude/sonnet) | `anthropic-claude-sonnet-4` | 64,000 | ✔️ Serverless inference, ✔️ ADK, ✔️ Agents | ✔️ Input context window of up to 1M tokens, ✔️ Prompt caching, ✔️ Tool calling | No sooner than May 2026 | |
| [Claude Haiku 4.5](https://www.anthropic.com/claude/haiku) | `anthropic-claude-4.5-haiku` | 64,000 | ✔️ Serverless inference, ✔️ ADK, ✔️ Agents | ✔️ Prompt caching, ✔️ Tool calling | No sooner than October 2026 | |
| [Claude Opus 4.6](https://www.anthropic.com/claude/opus) | `anthropic-claude-opus-4.6` | 128,000 | ✔️ Serverless inference, ✔️ ADK, ✔️ Agents | ✔️ Input context window of up to 1M tokens, ✔️ Prompt caching, ✔️ Tool calling | No sooner than February 2027 | |
| [Claude Opus 4.5](https://www.anthropic.com/claude/opus) | `anthropic-claude-opus-4.5` | 64,000 | ✔️ Serverless inference, ✔️ ADK, ✔️ Agents | ✔️ Prompt caching, ✔️ Tool calling | No sooner than November 2026 | |
| [Claude Opus 4.1](https://www.anthropic.com/claude/opus) | `anthropic-claude-4.1-opus` | 32,000 | ✔️ Serverless inference, ✔️ ADK, ✔️ Agents | ✔️ Prompt caching, ✔️ Tool calling | No sooner than August 2026 | |
| [Claude Opus 4](https://www.anthropic.com/claude/opus) | `anthropic-claude-opus-4` | 32,000 | ✔️ Serverless inference, ✔️ ADK, ✔️ Agents | ✔️ Prompt caching, ✔️ Tool calling | No sooner than May 2026 | |

## fal Models

| Model | Model ID | Type | Use for | Usage Notes |
|---|---|---|---|---|
| Fast SDXL | `fal-ai/fast-sdxl` | Image generation | ✔️ Serverless inference, ✔️ ADK | ℹ️ Multimodal and generative model |
| Flux Schnell | `fal-ai/flux/schnell` | Image generation | ✔️ Serverless inference, ✔️ ADK | ℹ️ Multimodal and generative model |
| Stable Audio 2.5 (Text-to-Audio) | `fal-ai/stable-audio-25/text-to-audio` | Text-to-audio | ✔️ Serverless inference, ✔️ ADK | ℹ️ Multimodal and generative model |
| Multilingual TTS v2 | `fal-ai/elevenlabs/tts/multilingual-v2` | Text-to-speech | ✔️ Serverless inference, ✔️ ADK | ℹ️ Multimodal and generative model |

## OpenAI Models

OpenAI models available on the Gradient AI Platform support [tool (function) calling](https://docs.digitalocean.com/products/gradient-ai-platform/details/features/index.html.md#tool-function-calling), [prompt caching](https://docs.digitalocean.com/products/gradient-ai-platform/details/features/index.html.md#prompt-caching), and other features. See the usage notes in the following table for details. Refer to the provider documentation for other supported features.

| Model | Model ID | Max Output Tokens | Use for | Usage Notes |
|---|---|---|---|---|
| [GPT-5.4](https://developers.openai.com/api/docs/models/gpt-5.4) | `openai-gpt-5.4` | 128,000 | ✔️ Serverless inference, ✔️ ADK, ✔️ Agents | ✔️ Input context window of up to 1M tokens, ✔️ Only the Responses API for sending prompts for serverless inference, ✔️ Prompt caching, ✔️ Tool calling |
| [GPT-5.4 mini](https://developers.openai.com/api/docs/models/gpt-5.4-mini) | `openai-gpt-5.4-mini` | 128,000 | ✔️ Serverless inference, ✔️ ADK | ✔️ Only the Responses API for sending prompts for serverless inference, ✔️ Prompt caching, ✔️ Tool calling |
| [GPT-5.4 nano](https://developers.openai.com/api/docs/models/gpt-5.4-nano) | `openai-gpt-5.4-nano` | 128,000 | ✔️ Serverless inference, ✔️ ADK | ✔️ Only the Responses API for sending prompts for serverless inference, ✔️ Prompt caching, ✔️ Tool calling |
| [GPT-5.4 pro](https://developers.openai.com/api/docs/models/gpt-5.4-pro) | `openai-gpt-5.4-pro` | 128,000 | ✔️ Serverless inference, ✔️ ADK | ✔️ Only the Responses API for sending prompts for serverless inference, ✔️ Tool calling |
| [GPT-5.3-Codex](https://developers.openai.com/api/docs/models/gpt-5.3-codex) | `openai-gpt-5.3-codex` | 128,000 | ✔️ Serverless inference, ✔️ ADK | ✔️ Input context window of up to 400,000 tokens, ✔️ Prompt caching, ✔️ Tool calling |
| [GPT-5.2](https://platform.openai.com/docs/models/gpt-5.2) | `openai-gpt-5.2` | 128,000 | ✔️ Serverless inference, ✔️ ADK, ✔️ Agents | ✔️ Prompt caching, ✔️ Tool calling |
| [GPT-5.2 pro](https://platform.openai.com/docs/models/gpt-5.2-pro) | `openai-gpt-5-2-pro` | 128,000 | ✔️ Serverless inference, ✔️ ADK, ✔️ Agents | ✔️ Prompt caching, ✔️ Tool calling |
| [GPT-5.1-Codex-Max](https://platform.openai.com/docs/models/gpt-5.1-codex-max) | `openai-gpt-5.1-codex-max` | 128,000 | ✔️ Serverless inference, ✔️ ADK | ✔️ Prompt caching, ✔️ Tool calling |
| [GPT-5](https://platform.openai.com/docs/models/gpt-5) | `openai-gpt-5` | 128,000 | ✔️ Serverless inference, ✔️ ADK, ✔️ Agents | ✔️ Prompt caching, ✔️ Tool calling |
| [GPT-5 mini](https://platform.openai.com/docs/models/gpt-5-mini) | `openai-gpt-5-mini` | 128,000 | ✔️ Serverless inference, ✔️ ADK, ✔️ Agents | ✔️ Prompt caching, ✔️ Tool calling |
| [GPT-5 nano](https://platform.openai.com/docs/models/gpt-5-nano) | `openai-gpt-5-nano` | 128,000 | ✔️ Serverless inference, ✔️ ADK, ✔️ Agents | ✔️ Prompt caching, ✔️ Tool calling |
| [GPT-4.1](https://platform.openai.com/docs/models/gpt-4.1) | `openai-gpt-4.1` | 32,768 | ✔️ Serverless inference, ✔️ ADK, ✔️ Agents | ✔️ Prompt caching, ✔️ Tool calling |
| [GPT-4o](https://platform.openai.com/docs/models/gpt-4o) | `openai-gpt-4o` | 16,384 | ✔️ Serverless inference, ✔️ ADK, ✔️ Agents | ✔️ Prompt caching, ✔️ Tool calling |
| [GPT-4o mini](https://platform.openai.com/docs/models/gpt-4o-mini) | `openai-gpt-4o-mini` | 16,384 | ✔️ Serverless inference, ✔️ ADK, ✔️ Agents | ✔️ Prompt caching, ✔️ Tool calling |
| [o1](https://platform.openai.com/docs/models/o1) | `openai-o1` | Not published | ✔️ Serverless inference, ✔️ ADK, ✔️ Agents | ✔️ Prompt caching, ✔️ Tool calling |
| [o3](https://platform.openai.com/docs/models/o3) | `openai-o3` | Not published | ✔️ Serverless inference, ✔️ ADK, ✔️ Agents | ✔️ Prompt caching, ✔️ Tool calling |
| [o3-mini](https://platform.openai.com/docs/models/o3-mini) | `openai-o3-mini` | Not published | ✔️ Serverless inference, ✔️ ADK, ✔️ Agents | ✔️ Prompt caching, ✔️ Tool calling |
| [GPT Image 1](https://platform.openai.com/docs/models/gpt-image-1) | `openai-gpt-image-1` | Not published | ✔️ Serverless inference, ✔️ ADK, ✔️ Agents | ✔️ Prompt caching, ✔️ Tool calling |
| [GPT Image 1.5](https://developers.openai.com/api/docs/models/gpt-image-1.5) | `openai-gpt-image-1.5` | Not published | ✔️ Serverless inference, ✔️ ADK | |

## DigitalOcean-Hosted Models

| Provider | Model | Model ID | Parameters | Max Output Tokens | Use for | Usage Notes |
|---|---|---|---|---|---|---|
| Alibaba | [Qwen3-32B](https://huggingface.co/Qwen/Qwen3-32B) | `alibaba-qwen3-32b` | 32 billion | 40,960 | ✔️ Serverless inference, ✔️ ADK | |
| DeepSeek | [DeepSeek R1 Distill Llama 70B](https://huggingface.co/deepseek-ai/DeepSeek-R1-Distill-Llama-70B) | `deepseek-r1-distill-llama-70b` | 70 billion | 32,768 | ✔️ Serverless inference, ✔️ ADK, ✔️ Agents | ℹ️ When using in a user-facing agent, we strongly recommend adding all available [guardrails](https://docs.digitalocean.com/products/gradient-ai-platform/how-to/manage-agent-guardrails/index.html.md#attach) for a safer conversational experience. |
| MiniMax | [M2.5](https://huggingface.co/MiniMaxAI/MiniMax-M2.5) (Public Preview) | `minimax-m2.5` | 230 billion | 128,000 | ✔️ Serverless inference, ✔️ ADK, ✔️ Agents | ✔️ Chat Completions and Responses APIs for sending prompts for serverless inference., ℹ️ Use is subject to [Public Preview Terms](https://www.digitalocean.com/legal/minimax-inference-offering) including [MiniMax Model License](https://github.com/MiniMax-AI/MiniMax-M2.5/blob/main/LICENSE-MODEL). |
| Moonshot AI | [Kimi K2.5](https://www.kimi.com/ai-models/kimi-k2-5) | `kimi-k2.5` | 1 trillion | 32,768 | ✔️ Serverless inference, ✔️ ADK, ✔️ Agents | ✔️ Chat Completions and Responses APIs for sending prompts for serverless inference., ℹ️ Use is subject to a [Modified MIT license](https://huggingface.co/moonshotai/Kimi-K2.5). |
| Meta | [Llama 3.3 Instruct-70B](https://huggingface.co/meta-llama/Llama-3.3-70B-Instruct) | `llama3.3-70b-instruct` | 70 billion | 128,000 | ✔️ Serverless inference, ✔️ ADK, ✔️ Agents | |
| Meta | [Llama 3.1 Instruct-8B](https://huggingface.co/meta-llama/Llama-3.1-8B-Instruct) | `llama3-8b-instruct` | 8 billion | 128,000 | ✔️ Serverless inference, ✔️ ADK, ✔️ Agents | |
| Mistral | [NeMo](https://mistral.ai/news/mistral-nemo/) | `mistral-nemo-instruct-2407` | 12 billion | 128,000 | ✔️ Serverless inference, ✔️ ADK, ✔️ Agents | |
| NVIDIA | [Nemotron-3-Super-120B](https://huggingface.co/nvidia/NVIDIA-Nemotron-3-Super-120B-A12B-NVFP4) (Public Preview) | `nvidia-nemotron-3-super-120b` | 120 billion | Not published | ✔️ Serverless inference, ✔️ ADK, ✔️ Agents | ✔️ Chat Completions and Responses APIs for sending prompts for serverless inference., ℹ️ Use is subject to [Public Preview Terms](https://www.digitalocean.com/legal/nvidia-nemotron-super-120b-inference-offering-public-preview) including [NVIDIA Model License](https://huggingface.co/nvidia/NVIDIA-Nemotron-3-Super-120B-A12B-NVFP4). |
| OpenAI | [gpt-oss-120b](https://platform.openai.com/docs/models/gpt-oss-120b) | `openai-gpt-oss-120b` | 117 billion | 131,072 | ✔️ Serverless inference, ✔️ ADK, ✔️ Agents | |
| OpenAI | [gpt-oss-20b](https://platform.openai.com/docs/models/gpt-oss-20b) | `openai-gpt-oss-20b` | 21 billion | 131,072 | ✔️ Serverless inference, ✔️ ADK, ✔️ Agents | |
| Z.ai | [GLM 5](https://z.ai/blog/glm-5) | `glm-5` | 744 billion | 128,000 | ✔️ Serverless inference, ✔️ ADK, ✔️ Agents | ✔️ Chat Completions and Responses APIs for sending prompts for serverless inference., ℹ️ Use is subject to the [MIT License](https://huggingface.co/zai-org/GLM-5). |

## Embedding Models

An embedding model converts data into vector embeddings. Gradient AI Platform stores vector embeddings in an OpenSearch database cluster for use with [agent knowledge bases](https://docs.digitalocean.com/products/gradient-ai-platform/how-to/create-manage-agent-knowledge-bases/index.html.md). The following embedding models are available on the platform, along with their token windows and recommended chunking ranges.

## Alibaba Models

| Model | Parameters | Token Window | Chunk Size Range | Parent Chunk Range | Child Chunk Range |
|---|---|---|---|---|---|
| [GTE Large (v1.5)](https://huggingface.co/Alibaba-NLP/gte-large-en-v1.5) | Not available | 8192 tokens | 0-750 | 500-1000 | 300-500 |
| [Qwen3 Embedding 0.6B (Multilingual)](https://huggingface.co/Qwen/Qwen3-Embedding-0.6B), (in public preview) | 600 million | 8000 tokens | 0-750 | 500-1000 | 300-500 |

## UKP Lab (Technical University of Darmstadt) Models

| Model | Parameters | Token Window | Chunk Size Range | Parent Chunk Range | Child Chunk Range |
|---|---|---|---|---|---|
| [All-MiniLM-L6-v2](https://huggingface.co/sentence-transformers/all-MiniLM-L6-v2) | 22 million | 256 tokens | 0-256 | 100-256 | 100-200 |
| [Multi-QA-mpnet-base-dot-v1](https://huggingface.co/sentence-transformers/multi-qa-mpnet-base-dot-v1) | 109 million | 512 tokens | 0-512 | 100-512 | 100-500 |