Available Models for Inference

Validated on 12 Jun 2026 • Last edited on 15 Jun 2026

Inference provides a single control plane for managing inference workflows. It includes a Model Catalog where you can view available foundation models, including both DigitalOcean-hosted and third-party commercial models, compare model capabilities and pricing, use routing to match inference requests to the best-fit model, and run inference using serverless or dedicated deployments.

The following foundation, embeddings, and reranking models are available.

We regularly update our model offerings to provide the most performant and efficient models, and deprecate older models. For information on our model deprecation policy and recommended model replacements, see Model Support Policy.

Foundation Models

Inference supports both open source and commercial foundation models. Open source models are generally published by research labs, available under open licenses. Commercial models are proprietary such as OpenAI and Anthropic models. All models are offered using DigitalOcean API access keys, but you can also bring your own provider’s API keys to access the commercial models.

We offer the following foundation models, subject to the AI Model Terms, our Service Terms, and the Terms of Service Agreement.

You can use these models in serverless inference, dedicated inference, inference routers, batch inference, agents, or Agent Development Kit (ADK). See the model-specific usage information below.

Anthropic Models

Anthropic models available on DigitalOcean Inference support tool (function) calling, prompt caching, adaptive thinking, fast mode, dynamic workflows, mid-conversation system messages, and other features. See the usage notes in the following table for details. Refer to the provider documentation for other supported features.

Model Model ID Context Window Max Output Tokens Serverless Inference ADK Agents Usage Notes Tentative End-of-Support
Claude Haiku 4.5 anthropic-claude-haiku-4.5 200,000 64,000
✔️
✔️
✔️
✔️ Prompt caching
✔️ Tool calling
No sooner than October 2026
Claude Opus 4.8 anthropic-claude-opus-4.8 1,000,000 128,000
✔️
✔️
✔️
✔️ Input context window of up to 1M tokens
✔️ Prompt caching
✔️ Tool calling
✔️ Fast mode
✔️ Adaptive thinking
✔️ Dynamic workflows
✔️ Mid-conversation system messages
No sooner than May 2027
Claude Opus 4.7 anthropic-claude-opus-4.7 200,000 128,000
✔️
✔️
✔️
✔️ Input context window of up to 1M tokens (beta)
✔️ Prompt caching
✔️ Tool calling
No sooner than April 2027
Claude Opus 4.6 anthropic-claude-opus-4.6 200,000 128,000
✔️
✔️
✔️
✔️ Input context window of up to 1M tokens (beta)
✔️ Prompt caching
✔️ Tool calling
No sooner than February 2027
Claude Opus 4.5 anthropic-claude-opus-4.5 200,000 64,000
✔️
✔️
✔️
✔️ Prompt caching
✔️ Tool calling
No sooner than November 2026
Claude Opus 4.1 anthropic-claude-4.1-opus 200,000 32,000
✔️
✔️
✔️ Prompt caching
✔️ Tool calling
No sooner than August 2026
Claude Opus 4 anthropic-claude-opus-4 200,000 32,000
✔️
✔️
✔️
✔️ Prompt caching
✔️ Tool calling
No sooner than May 2026
Claude Sonnet 4.6 anthropic-claude-4.6-sonnet 200,000 64,000
✔️
✔️
✔️
✔️ Input context window of up to 1M tokens (beta)
✔️ Prompt caching
✔️ Tool (function) calling
No sooner than February 2027
Claude Sonnet 4.5 anthropic-claude-4.5-sonnet 200,000 64,000
✔️
✔️
✔️
✔️ Input context window of up to 1M tokens (beta)
✔️ Prompt caching
✔️ Tool calling
No sooner than September 2026
Claude Sonnet 4 anthropic-claude-sonnet-4 200,000 64,000
✔️
✔️
✔️
✔️ Input context window of up to 1M tokens (beta)
✔️ Prompt caching
✔️ Tool calling
No sooner than May 2026
Arcee Models
Model Model ID Context Window Max Output Tokens Serverless Inference ADK Usage Notes
Trinity Large (Public Preview) arcee-trinity-large-thinking 128,000 128,000
✔️
✔️
✔️ Chat Completions API for sending prompts.
✔️ Prompt caching.
ℹ️ Use is subject to Public Preview Terms including Arcee Terms & Conditions.
fal Models
Model Model ID Type Use for Usage Notes
Fast SDXL fal-ai/fast-sdxl Image generation ✔️ Serverless inference
✔️ ADK
ℹ️ Multimodal and generative model
Flux Schnell fal-ai/flux/schnell Image generation ✔️ Serverless inference
✔️ ADK
ℹ️ Multimodal and generative model
Stable Audio 2.5 fal-ai/stable-audio-25/text-to-audio Text-to-audio ✔️ Serverless inference
✔️ ADK
ℹ️ Multimodal and generative model
Multilingual TTS v2 fal-ai/elevenlabs/tts/multilingual-v2 Text-to-speech ✔️ Serverless inference
✔️ ADK
ℹ️ Multimodal and generative model
OpenAI Models

OpenAI models available on DigitalOcean Inference support tool (function) calling, prompt caching, and other features. See the usage notes in the following table for details. Refer to the provider documentation for other supported features.

Model Model ID Context Window Max Output Tokens Serverless Inference ADK Agents Usage Notes
GPT-5.5 openai-gpt-5.5 1,000,000 128,000
✔️
✔️
✔️
✔️ Input context window of up to 1M tokens
✔️ Only the Responses API for sending prompts for serverless inference
✔️ Prompt caching
✔️ Tool calling
GPT-5.4 openai-gpt-5.4 400,000 128,000
✔️
✔️
✔️ Input context window of up to 1M tokens (beta)
✔️ Only the Responses API for sending prompts for serverless inference
✔️ Prompt caching
✔️ Tool calling
GPT-5.4 mini openai-gpt-5.4-mini 400,000 128,000
✔️
✔️
✔️ Only the Responses API for sending prompts for serverless inference
✔️ Prompt caching
✔️ Tool calling
GPT-5.4 nano openai-gpt-5.4-nano 400,000 128,000
✔️
✔️
✔️ Only the Responses API for sending prompts for serverless inference
✔️ Prompt caching
✔️ Tool calling
GPT-5.4 pro openai-gpt-5.4-pro 1,050,000 128,000
✔️
✔️
✔️ Only the Responses API for sending prompts for serverless inference
✔️ Tool calling
GPT-5.3-Codex openai-gpt-5.3-codex 400,000 128,000
✔️
✔️
✔️ Prompt caching
✔️ Tool calling
GPT-5.2 openai-gpt-5.2 400,000 128,000
✔️
✔️
✔️
✔️ Prompt caching
✔️ Tool calling
GPT-5.2 pro openai-gpt-5.2-pro 400,000 128,000
✔️
✔️
✔️ Prompt caching
✔️ Tool calling
GPT-5.1-Codex-Max openai-gpt-5.1-codex-max 400,000 128,000
✔️
✔️
✔️ Prompt caching
✔️ Tool calling
GPT-5 openai-gpt-5 400,000 128,000
✔️
✔️
✔️
✔️ Prompt caching
✔️ Tool calling
GPT-5 mini openai-gpt-5-mini 400,000 128,000
✔️
✔️
✔️
✔️ Prompt caching
✔️ Tool calling
GPT-5 nano openai-gpt-5-nano 400,000 128,000
✔️
✔️
✔️
✔️ Prompt caching
✔️ Tool calling
GPT-4.1 openai-gpt-4.1 1,047,576 32,768
✔️
✔️
✔️
✔️ Prompt caching
✔️ Tool calling
GPT-4o openai-gpt-4o 128,000 16,384
✔️
✔️
✔️
✔️ Prompt caching
✔️ Tool calling
GPT-4o mini openai-gpt-4o-mini 128,000 16,384
✔️
✔️
✔️
✔️ Prompt caching
✔️ Tool calling
o1 openai-o1 200,000 Not published
✔️
✔️
✔️
✔️ Prompt caching
✔️ Tool calling
o3 openai-o3 200,000 Not published
✔️
✔️
✔️
✔️ Prompt caching
✔️ Tool calling
o3-mini openai-o3-mini 200,000 Not published
✔️
✔️
✔️
✔️ Prompt caching
✔️ Tool calling
GPT Image 1 openai-gpt-image-1 Not published Not published
✔️
✔️
✔️ Prompt caching
✔️ Tool calling
GPT Image 1.5 openai-gpt-image-1.5 Not published Not published
✔️
✔️
GPT Image 2 openai-gpt-image-2 Not published Not published
✔️
✔️
DigitalOcean-Hosted Models
Provider Model Model ID Parameters Context Window Max Output Tokens Serverless Inference Dedicated Inference ADK Agents Usage Notes
Alibaba Qwen3-32B alibaba-qwen3-32b 32.8 billion 32,768 40,960
✔️
✔️
✔️
✔️
Alibaba Qwen3 Coder Flash qwen3-coder-flash 30 billion 262,144 65,536
✔️
✔️
✔️
✔️
✔️ Chat Completions and Responses APIs for sending prompts for serverless inference.
Alibaba Qwen 3.5 397B A17B qwen3.5-397b-a17b 397 billion 131,072 81,920
✔️
✔️
✔️
✔️
Alibaba Qwen 3 TTS (1.7B) qwen3-tts-voicedesign 1.7 billion 32,768 Not published
✔️
✔️
ℹ️ Text-to-speech. Multimodal and generative model.
Alibaba Wan2.2-T2V-A14B wan2-2-t2v-a14b 14 billion 100 Not published
✔️
✔️
ℹ️ Text-to-video. Multimodal and generative model.
DeepSeek DeepSeek R1 Distill Llama 70B deepseek-r1-distill-llama-70b 70 billion 32,678 32,768
✔️
✔️
✔️
✔️
ℹ️ When using in a user-facing agent, we strongly recommend adding all available guardrails for a safer conversational experience.
DeepSeek DeepSeek V4 Pro deepseek-v4-pro 1.6 trillion 1,048,576 1,048,576
✔️
✔️
✔️
✔️ Chat Completions and Responses APIs for sending prompts for serverless inference.
DeepSeek DeepSeek V4 Flash deepseek-4-flash 284 billion 262,144 262,144
✔️
✔️
✔️
✔️ Chat Completions and Responses APIs for sending prompts for serverless inference.
DeepSeek DeepSeek V3.2 deepseek-3.2 680 billion 128,000 64,000
✔️
✔️
✔️
✔️
✔️ Chat Completions and Responses APIs for sending prompts for serverless inference.
Google Gemma 4 gemma-4-31B-it 31 billion 256,000 256,000
✔️
✔️
✔️
✔️
✔️ Chat Completions and Responses APIs for sending prompts for serverless inference.
MiniMax M2.5 (Public Preview) minimax-m2.5 230 billion 200,000 128,000
✔️
✔️
✔️
✔️
✔️ Chat Completions and Responses APIs for sending prompts for serverless inference.
ℹ️ Use is subject to Public Preview Terms including MiniMax Model License.
Moonshot AI Kimi K2.5 kimi-k2.5 1 trillion 256,000 32,768
✔️
✔️
✔️
✔️
✔️ Chat Completions and Responses APIs for sending prompts for serverless inference.
ℹ️ Use is subject to a Modified MIT license.
Moonshot AI Kimi K2.6 kimi-k2.6 1 trillion 262,144 262,144
✔️
✔️
✔️
✔️
✔️ Chat Completions and Responses APIs for sending prompts for serverless inference.
ℹ️ Use is subject to a Modified MIT license.
Meta Llama 3.3 Instruct-70B llama3.3-70b-instruct 70 billion 128,000 128,000
✔️
✔️
✔️
✔️
Meta Llama 4 Maverick 17B 128E Instruct llama-4-maverick 400 billion 128,000 16,384
✔️
✔️
✔️
✔️
✔️ Chat Completions and Responses APIs for sending prompts for serverless inference.
Mistral AI Ministral 3 14B Instruct mistral-3-14B 14 billion 262,144 128,000
✔️
✔️
✔️
✔️
✔️ Chat Completions and Responses APIs for sending prompts for serverless inference.
NVIDIA Nemotron 3 Ultra nemotron-3-ultra-550b 550 billion 131,072 131,072
✔️
✔️
✔️
✔️
✔️ Chat Completions and Responses APIs for sending prompts for serverless inference.
NVIDIA Nemotron-3-Super-120B (Public Preview) nvidia-nemotron-3-super-120b 120 billion 1,000,000 Not published
✔️
✔️
✔️
✔️ Chat Completions and Responses APIs for sending prompts for serverless inference.
ℹ️ Use is subject to Public Preview Terms including NVIDIA Model License.
NVIDIA Nemotron 3 Nano Omni nemotron-3-nano-omni 30 billion 65,536 65,536
✔️
✔️
✔️
✔️ Chat Completions and Responses APIs for sending prompts for serverless inference.
ℹ️ Context window 65,536 tokens.
NVIDIA Nemotron Nano 12B v2 VL nemotron-nano-12b-v2-vl 12 billion 128,000 16,384
✔️
✔️
✔️
✔️
✔️ Chat Completions and Responses APIs for sending prompts for serverless inference.
OpenAI gpt-oss-120b openai-gpt-oss-120b Not published 128,000 131,072
✔️
✔️
✔️
✔️
OpenAI gpt-oss-20b openai-gpt-oss-20b Not published 128,000 131,072
✔️
✔️
✔️
✔️
Stability AI Stable Diffusion 3.5 Large stable-diffusion-3.5-large 8 billion 256 Not published
✔️
✔️
ℹ️ Image generation. Multimodal and generative model.
Xiaomi MiMo V2.5 mimo-v2.5 Not published 262,144 131,072
✔️
✔️
✔️
✔️ Chat Completions and Responses APIs for sending prompts for serverless inference.
✔️ Tool calling
✔️ Structured outputs
✔️ Reasoning
✔️ Multilingual
ℹ️ Use is subject to the MIT License.
Z.ai GLM 5 glm-5 744 billion 128,000 128,000
✔️
✔️
✔️
✔️
✔️ Chat Completions and Responses APIs for sending prompts for serverless inference.
ℹ️ Use is subject to the MIT License.

Embeddings Models

An embedding model converts data into vector embeddings. DigitalOcean stores vector embeddings in an OpenSearch database cluster for use with agent knowledge bases. The following embeddings models are available on the platform, along with their token windows and recommended chunking ranges.

Alibaba Models
Model Parameters Token Window Chunk Size Range Parent Chunk Range Child Chunk Range
GTE Large (v1.5) Not available 8192 tokens 0-750 500-1000 300-500
Qwen3 Embedding 0.6B (Multilingual)
(in Public Preview)
600 million 8000 tokens 0-750 500-1000 300-500
BAAI Models
Model Parameters Token Window Chunk Size Range Parent Chunk Range Child Chunk Range
BGE M3 568M 8192 tokens 0-8192 Not Specified Not Specified
Intfloat Models
Model Parameters Token Window Chunk Size Range Parent Chunk Range Child Chunk Range
E5 Large (multilingual) 560 million 514 tokens 0-512 100-512 100-500
E5 Large (v2) Not available 512 tokens 0-512 Not Specified Not Specified
UKP Lab (Technical University of Darmstadt) Models
Model Parameters Token Window Chunk Size Range Parent Chunk Range Child Chunk Range
All-MiniLM-L6-v2 22 million 256 tokens 0-256 100-256 100-200
Multi-QA-mpnet-base-dot-v1 109 million 512 tokens 0-512 100-512 100-500

Reranking Models

Reranking models reorder retrieved results to improve relevance after the initial retrieval step, and can also be used with vector databases. DigitalOcean supports the following reranking model for knowledge base retrieval:

BAAI Models
Model Parameters Usage Notes
BGE Reranker (v2) M3 Not available Can be enabled at knowledge base creation, updated after creation.

We can't find any results for your search.

Try using different keywords or simplifying your search terms.