Give Feedback

Supported Models on DigitalOcean Inference

Last verified 27 Jul 2026

Inference provides a single control plane for managing inference workflows. It includes a Model Catalog where you can view available foundation models, including both DigitalOcean-hosted and third-party commercial models, compare model capabilities and pricing, use routing to match inference requests to the best-fit model, and run inference using serverless or dedicated deployments.

Copy page as Markdown View page as Markdown

DigitalOcean Inference supports more than 70 foundation, embeddings, and reranking models, including OpenAI (GPT-5.x and open-weight gpt-oss), Anthropic Claude, Meta Llama, DeepSeek, Alibaba Qwen, Moonshot AI Kimi, NVIDIA Nemotron, and Z.ai GLM. All text models are served through OpenAI-compatible endpoints, so you can use an existing OpenAI SDK or client by changing the base URL to https://inference.do-ai.run and authenticating with a DigitalOcean API key. For endpoint details, see Serverless Inference Endpoints.

Note

For pricing information, see the pricing page.

We regularly update our model offerings to provide the most performant and efficient models, and deprecate older models. For information on our model deprecation policy and recommended model replacements, see Model Support Policy.

Models by Provider

The following table summarizes the model providers and model family. For a model’s ID, context window, max output tokens, and supported features, see Foundation Models.

Provider	Models
Alibaba	Includes Qwen 3.5 397B A17B, Qwen3-32B, Qwen3 Coder Flash, and Qwen 2.5 14B Instruct, plus the Qwen 3 text-to-speech and Wan text-to-video models.
Anthropic	Includes Claude Fable 5, Claude Sonnet 5.x and 4.x, Claude Opus 4.x, and Claude Haiku 4.x with prompt caching, tool calling, and input context windows of up to 1M tokens on supported models.
DeepSeek	Includes DeepSeek V4 Pro and V4 Flash (with input context windows of up to 1M tokens), DeepSeek V3.2, DeepSeek V3, and DeepSeek R1 Distill Llama 70B.
Meta	Includes Llama 4 Maverick and Llama 3.3 Instruct 70B for serverless or dedicated inference, and Llama 3.1 Instruct 8B for dedicated inference only.
Moonshot AI	Includes Kimi K3, K2.6, and K2.5 for serverless and dedicated inference, with prompt caching. All are multimodal with vision support.
NVIDIA	Includes Nemotron 3 Ultra, Nemotron 3 Super 120B (Public Preview), and the Nemotron Nano models, including the Nano Omni and Nano 12B v2 VL multimodal variants.
OpenAI	Includes GPT-5.6 family (Sol, Terra, and Luna), GPT-5.5, the GPT-5.4 family, GPT-5.3-Codex, GPT-5.2, GPT-5, GPT-4.1, GPT-4o, the o-series reasoning models, and the GPT Image models. The open-weight gpt-oss-120b and gpt-oss-20b models run on DigitalOcean-hosted infrastructure for both serverless and dedicated inference.
Additional providers (open and multimodal models)	Includes Z.ai GLM 5.x, Xiaomi MiMo V2.5, MiniMax M2.5, Google Gemma 4, Mistral and Ministral models, Arcee Trinity Large, and media generation models from Stability AI, fal, and ElevenLabs for image, audio, and video workloads.

Foundation Models

Inference supports both open source and commercial foundation models. Open source models are generally published by research labs, available under open licenses. Commercial models are proprietary such as OpenAI and Anthropic models. All models are offered using DigitalOcean API access keys, but you can also bring your own provider’s API keys to access the commercial models.

We offer the following foundation models, subject to the AI Model Terms, our Service Terms, and the Terms of Service Agreement.

You can use these models in serverless inference, dedicated inference, inference routers, batch inference, agents, or Agent Development Kit (ADK). See the model-specific usage information below.

Anthropic Models

Anthropic models available on DigitalOcean Inference support tool (function) calling, prompt caching, adaptive thinking, fast mode, dynamic workflows, mid-conversation system messages, and other features. See the usage notes in the following table for details. Refer to the provider documentation for other supported features.

Model	Model ID	Context Window	Max Output Tokens	Serverless Inference	ADK	Agents	Usage Notes	Tentative End-of-Support
Claude Fable 5	`anthropic-claude-fable-5`	1,000,000	128,000	✔️	✔️	✔️	✔️ Input context window of up to 1M tokens ✔️ Prompt caching ✔️ Tool calling ✔️ Adaptive thinking ℹ️ Requires a mandatory 30-day data retention of prompts and completions for trust and safety reviews.
Claude Haiku 4.5	`anthropic-claude-haiku-4.5`	200,000	8,192	✔️	✔️	✔️	✔️ Prompt caching ✔️ Tool calling	No sooner than October 2026
Claude Opus 5	`anthropic-claude-opus-5`	1,000,000	128,000	✔️	✔️	✔️	✔️ Prompt caching ✔️ Tool calling ✔️ Fast mode ✔️ Adaptive thinking ✔️ Cyber safeguards enabled by default with safety classifiers ℹ️ Thinking is on by default in the API, a breaking change from previous Opus models. To keep it off, set `thinking: off`.
Claude Opus 4.8	`anthropic-claude-opus-4.8`	1,000,000	128,000	✔️	✔️	✔️	✔️ Evaluations judge model ✔️ Input context window of up to 1M tokens ✔️ Prompt caching ✔️ Tool calling ✔️ Fast mode ✔️ Adaptive thinking ✔️ Dynamic workflows ✔️ Mid-conversation system messages	No sooner than May 2027
Claude Opus 4.7	`anthropic-claude-opus-4.7`	200,000	8,192	✔️	✔️	✔️	✔️ Input context window of up to 1M tokens (beta) ✔️ Prompt caching ✔️ Tool calling	No sooner than April 2027
Claude Opus 4.6	`anthropic-claude-opus-4.6`	200,000	8,192	✔️	✔️	✔️	✔️ Input context window of up to 1M tokens (beta) ✔️ Prompt caching ✔️ Tool calling	No sooner than February 2027
Claude Opus 4.5	`anthropic-claude-opus-4.5`	200,000	8,192	✔️	✔️	✔️	✔️ Prompt caching ✔️ Tool calling	No sooner than November 2026
Claude Opus 4.1	`anthropic-claude-4.1-opus`	200,000	8,192	✔️	✔️		✔️ Prompt caching ✔️ Tool calling	No sooner than August 2026
Claude Sonnet 5	`anthropic-claude-5-sonnet`	1,000,000	128,000	✔️	✔️	✔️	✔️ Input context window of up to 1M tokens ✔️ Prompt caching ✔️ Tool (function) calling ✔️ Adaptive thinking (API default: on, effort high) ✔️ Effort levels: low, medium, high, max, x-high
Claude Sonnet 4.6	`anthropic-claude-4.6-sonnet`	200,000	8,192	✔️	✔️	✔️	✔️ Evaluations judge model ✔️ Input context window of up to 1M tokens (beta) ✔️ Prompt caching ✔️ Tool (function) calling	No sooner than February 2027
Claude Sonnet 4.5	`anthropic-claude-4.5-sonnet`	200,000	Not published	✔️	✔️	✔️	✔️ Input context window of up to 1M tokens (beta) ✔️ Prompt caching ✔️ Tool calling	No sooner than September 2026

Arcee Models

Model	Model ID	Context Window	Max Output Tokens	Serverless Inference	ADK	Usage Notes
Trinity Large (Public Preview)	`arcee-trinity-large-thinking`	128,000	32,000	✔️	✔️	✔️ Chat Completions API for sending prompts. ✔️ Prompt caching. ℹ️ Use is subject to Public Preview Terms including Arcee Terms & Conditions.

fal Models

Model	Model ID	Type	Use for	Usage Notes
Fast SDXL	`fal-ai/fast-sdxl`	Image generation	✔️ Serverless inference ✔️ ADK	ℹ️ Multimodal and generative model
Flux Schnell	`fal-ai/flux/schnell`	Image generation	✔️ Serverless inference ✔️ ADK	ℹ️ Multimodal and generative model
Stable Audio 2.5	`fal-ai/stable-audio-25/text-to-audio`	Text-to-audio	✔️ Serverless inference ✔️ ADK	ℹ️ Multimodal and generative model
Multilingual TTS v2	`fal-ai/elevenlabs/tts/multilingual-v2`	Text-to-speech	✔️ Serverless inference ✔️ ADK	ℹ️ Multimodal and generative model

OpenAI Models

OpenAI models available on DigitalOcean Inference support tool (function) calling, prompt caching, and other features. See the usage notes in the following table for details. Refer to the provider documentation for other supported features.

Model	Model ID	Context Window	Max Output Tokens	Serverless Inference	ADK	Agents	Usage Notes
GPT-5.6 Sol	`openai-gpt-5.6-sol`	1,050,000	128,000	✔️	✔️	✔️	✔️ Input context window of up to 1.05M tokens ✔️ Only the Chat Completions API for sending prompts for serverless inference ✔️ Prompt caching with explicit breakpoints and 30-minute minimum cache life; cache writes billed at 1.25× uncached input rate; cache reads at 90% discount ✔️ Tool calling ✔️ `max` reasoning effort ✔️ Cyber safeguards enabled by default with real-time classifiers
GPT-5.6 Terra	`openai-gpt-5.6-terra`	1,050,000	128,000	✔️	✔️	✔️	✔️ Input context window of up to 1.05M tokens ✔️ Only the Chat Completions API for sending prompts for serverless inference ✔️ Prompt caching with explicit breakpoints and 30-minute minimum cache life; cache writes billed at 1.25× uncached input rate; cache reads at 90% discount ✔️ Tool calling
GPT-5.6 Luna	`openai-gpt-5.6-luna`	1,050,000	128,000	✔️	✔️	✔️	✔️ Input context window of up to 1.05M tokens ✔️ Only the Chat Completions API for sending prompts for serverless inference ✔️ Prompt caching with explicit breakpoints and 30-minute minimum cache life; cache writes billed at 1.25× uncached input rate; cache reads at 90% discount ✔️ Tool calling ✔️ `ultra` mode for subagents
GPT-5.5	`openai-gpt-5.5`	1,000,000	128,000	✔️	✔️	✔️	✔️ Evaluations judge model ✔️ Input context window of up to 1M tokens ✔️ Only the Responses API for sending prompts for serverless inference ✔️ Prompt caching ✔️ Tool calling
GPT-5.4	`openai-gpt-5.4`	400,000	128,000	✔️	✔️		✔️ Evaluations judge model ✔️ Input context window of up to 1M tokens (beta) ✔️ Only the Responses API for sending prompts for serverless inference ✔️ Prompt caching ✔️ Tool calling
GPT-5.4 mini	`openai-gpt-5.4-mini`	400,000	128,000	✔️	✔️		✔️ Only the Responses API for sending prompts for serverless inference ✔️ Prompt caching ✔️ Tool calling
GPT-5.4 nano	`openai-gpt-5.4-nano`	400,000	128,000	✔️	✔️		✔️ Only the Responses API for sending prompts for serverless inference ✔️ Prompt caching ✔️ Tool calling
GPT-5.4 pro	`openai-gpt-5.4-pro`	1,050,000	128,000	✔️	✔️		✔️ Evaluations judge model ✔️ Only the Responses API for sending prompts for serverless inference ✔️ Tool calling
GPT-5.3-Codex	`openai-gpt-5.3-codex`	400,000	128,000	✔️	✔️		✔️ Prompt caching ✔️ Tool calling
GPT-5.2	`openai-gpt-5.2`	400,000	128,000	✔️	✔️	✔️	✔️ Prompt caching ✔️ Tool calling
GPT-5.2 pro	`openai-gpt-5.2-pro`	400,000	128,000	✔️	✔️		✔️ Prompt caching ✔️ Tool calling
GPT-5	`openai-gpt-5`	400,000	128,000	✔️	✔️	✔️	✔️ Prompt caching ✔️ Tool calling
GPT-5 mini	`openai-gpt-5-mini`	400,000	128,000	✔️	✔️	✔️	✔️ Prompt caching ✔️ Tool calling
GPT-5 nano	`openai-gpt-5-nano`	400,000	128,000	✔️	✔️	✔️	✔️ Prompt caching ✔️ Tool calling
GPT-4.1	`openai-gpt-4.1`	1,047,576	32,768	✔️	✔️	✔️	✔️ Prompt caching ✔️ Tool calling
GPT-4o	`openai-gpt-4o`	128,000	16,384	✔️	✔️	✔️	✔️ Evaluations judge model ✔️ Prompt caching ✔️ Tool calling
GPT-4o mini	`openai-gpt-4o-mini`	128,000	16,384	✔️	✔️	✔️	✔️ Prompt caching ✔️ Tool calling
o1	`openai-o1`	200,000	100,000	✔️	✔️	✔️	✔️ Prompt caching ✔️ Tool calling
o3	`openai-o3`	200,000	100,000	✔️	✔️	✔️	✔️ Evaluations judge model ✔️ Prompt caching ✔️ Tool calling
o3-mini	`openai-o3-mini`	200,000	100,000	✔️	✔️	✔️	✔️ Prompt caching ✔️ Tool calling
GPT Image 1	`openai-gpt-image-1`	Not published	16,384	✔️	✔️		✔️ Prompt caching ✔️ Tool calling
GPT Image 1.5	`openai-gpt-image-1.5`	Not published	16,384	✔️	✔️
GPT Image 2	`openai-gpt-image-2`	Not published	16,384	✔️	✔️

DigitalOcean-Hosted Models

Provider	Model	Model ID	Parameters	Context Window	Max Output Tokens	Serverless Inference	Dedicated Inference	ADK	Agents	Usage Notes
Alibaba	Qwen 2.5 14B Instruct	`qwen-2.5-14b-instruct`	14 billion	32,768	8,192		✔️
Alibaba	Qwen3-32B	`alibaba-qwen3-32b`	32.8 billion	32,768	6,554	✔️	✔️	✔️	✔️	✔️ Evaluations judge model
Alibaba	Qwen3 Coder Flash	`qwen3-coder-flash`	30 billion	262,144	52,429	✔️	✔️	✔️	✔️	✔️ Chat Completions and Responses APIs for sending prompts for serverless inference. ✔️ Prompt caching
Alibaba	Qwen 3.5 397B A17B	`qwen3.5-397b-a17b`	397 billion	131,072	26,214	✔️	✔️	✔️	✔️	✔️ Chat Completions and Responses APIs for sending prompts for serverless inference. ✔️ Prompt caching. ✔️ Evaluations judge model
Alibaba	Qwen 3 TTS (1.7B)	`qwen3-tts-voicedesign`	1.7 billion	32,768	Not Applicable	✔️		✔️		ℹ️ Text-to-speech. Multimodal and generative model.
Alibaba	Wan2.2-T2V-A14B	`wan2-2-t2v-a14b`	14 billion	100	Not Applicable	✔️		✔️		ℹ️ Text-to-video. Multimodal and generative model.
DeepSeek	DeepSeek R1 Distill Llama 70B	`deepseek-r1-distill-llama-70b`	70 billion	32,678	8,192	✔️	✔️	✔️	✔️	ℹ️ When using in a user-facing agent, we strongly recommend adding all available guardrails for a safer conversational experience.
DeepSeek	DeepSeek V4 Pro	`deepseek-v4-pro`	1.6 trillion	262,144	52,429	✔️		✔️	✔️	✔️ Input context window of up to 1M tokens ✔️ Chat Completions and Responses APIs for sending prompts for serverless inference. ✔️ Prompt caching
DeepSeek	DeepSeek V4 Flash	`deepseek-4-flash`	284 billion	262,144	52,429	✔️		✔️	✔️	✔️ Input context window of up to 1M tokens ✔️ Chat Completions and Responses APIs for sending prompts for serverless inference. ✔️ Prompt caching
DeepSeek	DeepSeek V3.2	`deepseek-3.2`	680 billion	163,840	32,768	✔️	✔️	✔️	✔️	✔️ Chat Completions and Responses APIs for sending prompts for serverless inference. ✔️ Prompt caching
DeepSeek	DeepSeek V3	`deepseek-v3`	671 billion	163,840	Not published		✔️
Google	Gemma 4	`gemma-4-31B-it`	31 billion	256,000	8,192	✔️	✔️	✔️	✔️	✔️ Chat Completions and Responses APIs for sending prompts for serverless inference. ✔️ Prompt caching
MiniMax	M2.5 (Public Preview)	`minimax-m2.5`	230 billion	65,536	13,107	✔️	✔️	✔️	✔️	✔️ Chat Completions and Responses APIs for sending prompts for serverless inference. ✔️ Prompt caching ℹ️ Use is subject to Public Preview Terms including MiniMax Model License.
Moonshot AI	Kimi K3	`kimi-k3`	Not published	Not published	Not published	✔️		✔️	✔️	✔️ Chat Completions API for sending prompts for serverless inference. ✔️ Prompt caching ✔️ Native vision (text, images) ℹ️ Use is subject to the model license.
Moonshot AI	Kimi K2.6	`kimi-k2.6`	1 trillion	262,144	52,429	✔️	✔️	✔️	✔️	✔️ Chat Completions and Responses APIs for sending prompts for serverless inference. ✔️ Prompt caching ℹ️ Use is subject to a Modified MIT license.
Moonshot AI	Kimi K2.5	`kimi-k2.5`	1 trillion	262,144	52,429	✔️	✔️	✔️	✔️	✔️ Chat Completions and Responses APIs for sending prompts for serverless inference. ✔️ Prompt caching ℹ️ Use is subject to a Modified MIT license.
Meta	Llama 3.1 Instruct (8B)	`llama3-8b-instruct`	80 billion	131,072	Not published		✔️
Meta	Llama 3.3 Instruct-70B	`llama3.3-70b-instruct`	70 billion	128,000	4,096	✔️	✔️	✔️	✔️	✔️ Evaluations judge model
Meta	Llama 4 Maverick 17B 128E Instruct	`llama-4-maverick`	400 billion	128,000	16,384	✔️	✔️	✔️	✔️	✔️ Chat Completions and Responses APIs for sending prompts for serverless inference.
Mistral AI	Ministral 3 8B Instruct	`ministral-3-8b-instruct-2512`	8.92 billion	262,144	Not published		✔️
Mistral AI	Ministral 3 14B Instruct	`mistral-3-14B`	14 billion	262,144	Not published	✔️	✔️	✔️	✔️	✔️ Chat Completions and Responses APIs for sending prompts for serverless inference.
Mistral AI	Mistral 7B Instruct v0.3	`mistral-7b-instruct-v0.3`	7 billion	32,768	Not published		✔️
NVIDIA	Nemotron 3 Ultra	`nemotron-3-ultra-550b`	550 billion	131,072	26,214	✔️	✔️	✔️	✔️	✔️ Evaluations judge model ✔️ Chat Completions and Responses APIs for sending prompts for serverless inference.
NVIDIA	Nemotron-3-Super-120B (Public Preview)	`nvidia-nemotron-3-super-120b`	120 billion	1,000,000	32,768	✔️		✔️	✔️	✔️ Chat Completions and Responses APIs for sending prompts for serverless inference. ℹ️ Use is subject to Public Preview Terms including NVIDIA Model License.
NVIDIA	Nemotron 3 Nano 30B A3B	`nemotron-3-nano-30b`	30 billion	262,144	Not published		✔️
NVIDIA	Nemotron 3 Nano Omni	`nemotron-3-nano-omni`	30 billion	65,536	13,107	✔️		✔️	✔️	✔️ Chat Completions and Responses APIs for sending prompts for serverless inference. ℹ️ Context window 65,536 tokens.
NVIDIA	Nemotron Nano 12B v2 VL	`nemotron-nano-12b-v2-vl`	12 billion	128,000	16,384	✔️	✔️	✔️	✔️	✔️ Chat Completions and Responses APIs for sending prompts for serverless inference.
OpenAI	gpt-oss-120b	`openai-gpt-oss-120b`	Not published	128,000	4,096	✔️	✔️	✔️	✔️	✔️ Prompt caching
OpenAI	gpt-oss-20b	`openai-gpt-oss-20b`	Not published	128,000	4,096	✔️	✔️	✔️	✔️
Stability AI	Stable Diffusion 3.5 Large	`stable-diffusion-3.5-large`	8 billion	256	Not Applicable	✔️		✔️		ℹ️ Image generation. Multimodal and generative model.
Xiaomi	MiMo V2.5	`mimo-v2.5`	Not published	262,144	52,429	✔️		✔️	✔️	✔️ Input context window of up to 1M tokens ✔️ Chat Completions and Responses APIs for sending prompts for serverless inference. ✔️ Prompt caching ✔️ Tool calling ✔️ Structured outputs ✔️ Reasoning ✔️ Multilingual ℹ️ Use is subject to the MIT License.
Xiaomi	MiMo V2.5 Pro	`mimo-v2.5-pro`	1 trillion	262,144	52,429	✔️	✔️			✔️ Input context window of up to 1M tokens ✔️ Text only ✔️ Chat Completions and Responses APIs for sending prompts for serverless inference. ✔️ Prompt caching ✔️ Tool calling ✔️ Structured outputs ✔️ Reasoning ✔️ Multilingual ℹ️ Use is subject to the MIT License.
Z.ai	GLM-5.2	`glm-5.2`	Not published	262,144	52,429	✔️	✔️	✔️	✔️	✔️ Input context window of up to 1M tokens ✔️ Chat Completions and Responses APIs for sending prompts for serverless inference. ✔️ Prompt caching ✔️ Text only ✔️ Tool calling ✔️ Structured outputs ✔️ Reasoning ✔️ Multilingual ℹ️ Use is subject to the MIT License.
Z.ai	GLM-5.1	`glm-5.1`	754 billion	163,840	32,768	✔️	✔️	✔️	✔️	✔️ Chat Completions and Responses APIs for sending prompts for serverless inference. ✔️ Prompt caching ✔️ Text only ✔️ Tool calling ✔️ Structured outputs ✔️ Reasoning ✔️ Multilingual ℹ️ Use is subject to the MIT License.
Z.ai	GLM 5	`glm-5`	744 billion	64,000	12,800	✔️	✔️	✔️	✔️	✔️ Chat Completions and Responses APIs for sending prompts for serverless inference. ✔️ Prompt caching ℹ️ Use is subject to the MIT License.

Embeddings Models

An embedding model converts data into vector embeddings. DigitalOcean stores vector embeddings in an OpenSearch database cluster for use with agent knowledge bases. The following embeddings models are available on the platform, along with their token windows and recommended chunking ranges.

Alibaba Models

Model	Parameters	Token Window	Chunk Size Range	Parent Chunk Range	Child Chunk Range
GTE Large (v1.5)	Not available	8192 tokens	0-750	500-1000	300-500
Qwen3 Embedding 0.6B (Multilingual) (in Public Preview)	600 million	8000 tokens	0-750	500-1000	300-500

BAAI Models

Model	Parameters	Token Window	Chunk Size Range	Parent Chunk Range	Child Chunk Range
BGE M3	568M	8192 tokens	0-8192	Not Specified	Not Specified

Intfloat Models

Model	Parameters	Token Window	Chunk Size Range	Parent Chunk Range	Child Chunk Range
E5 Large (multilingual)	560 million	514 tokens	0-512	100-512	100-500
E5 Large (v2)	Not available	512 tokens	0-512	Not Specified	Not Specified

UKP Lab (Technical University of Darmstadt) Models

Model	Parameters	Token Window	Chunk Size Range	Parent Chunk Range	Child Chunk Range
All-MiniLM-L6-v2	22 million	256 tokens	0-256	100-256	100-200
Multi-QA-mpnet-base-dot-v1	109 million	512 tokens	0-512	100-512	100-500

Reranking Models

Reranking models reorder retrieved results to improve relevance after the initial retrieval step, and can also be used with vector databases. DigitalOcean supports the following reranking model for knowledge base retrieval:

BAAI Models

Model	Parameters	Usage Notes
BGE Reranker (v2) M3	Not available	Can be enabled at knowledge base creation, updated after creation.

Supported Models on DigitalOcean Inference

Models by Provider

Foundation Models

Embeddings Models

Reranking Models

We can't find any results for your search.