For AI agents: The documentation index is at https://docs.digitalocean.com/llms.txt. Markdown versions of pages use the same URL with index.html.md in place of the HTML page (for example, append index.html.md to the directory path instead of opening the HTML document).
Inference Hub itself has no cost. The Model Catalog API is free to use for browsing supported models and reviewing pricing and capabilities. Costs are incurred only when you run inference or deploy models.
Serverless Inference
Serverless inference is billed by DigitalOcean for both open-source and commercial models. Prices align with each provider’s published rates for transparency.
The following shows pricing for foundation models available through serverless inference in Inference Hub.
Anthropic Models
When using Anthropic commercial models with your own model API keys, billing is handled directly by Anthropic at the provider’s rates.
Claude Sonnet 4.6, Sonnet 4.5, and Sonnet 4 support an input context window of up to 1M tokens.
| Model |
Serverless Inference |
| Claude Sonnet 4.6 |
Prompts ≤200K tokens$3.00 per 1M input tokens $15.00 per 1M output tokens Prompts >200K tokens$6.00 per 1M input tokens $22.50 per 1M output tokens Prompt caching$3.75 per 1M cache creation 5m input tokens $6.00 per 1M cache creation 1h input tokens $0.30 per 1M cache read input tokens
|
| Claude Sonnet 4.5 |
Prompts ≤200K tokens$3.00 per 1M input tokens $15.00 per 1M output tokens Prompts >200K tokens$6.00 per 1M input tokens $22.50 per 1M output tokens Prompt caching$3.75 per 1M cache creation 5m input tokens $6.00 per 1M cache creation 1h input tokens $0.30 per 1M cache read input tokens
|
| Claude Sonnet 4 |
Prompts ≤200K tokens$3.00 per 1M input tokens $15.00 per 1M output tokens Prompts >200K tokens$6.00 per 1M input tokens $22.50 per 1M output tokens Prompt caching$3.75 per 1M cache creation 5m input tokens $6.00 per 1M cache creation 1h input tokens $0.30 per 1M cache read input tokens
|
| Claude Haiku 4.5 |
Input/output tokens$1.00 per 1M input tokens $5.00 per 1M output tokens Prompt caching$1.25 per 1M cache creation 5m input tokens $2.00 per 1M cache creation 1h input tokens $1.00 per 1M cache read input tokens
|
| Claude Opus 4.7 |
Input/output tokens$5.00 per 1M input tokens $25.00 per 1M output tokens Prompt caching$6.25 per 1M cache creation 5m input tokens $10.00 per 1M cache creation 1h input tokens $0.50 per 1M cache read input tokens
|
| Claude Opus 4.6 |
Prompts ≤200K tokens$5.00 per 1M input tokens $25.00 per 1M output tokens Prompts >200K tokens$10.00 per 1M input tokens $37.50 per 1M output tokens Prompt caching$6.25 per 1M cache creation 5m input tokens $10.00 per 1M cache creation 1h input tokens $0.50 per 1M cache read input tokens
|
| Claude Opus 4.5 |
Input/output tokens$5.00 per 1M input tokens $25.00 per 1M output tokens Prompt caching$6.25 per 1M cache creation 5m input tokens $10.00 per 1M cache creation 1h input tokens $0.50 per 1M cache read input tokens
|
| Claude Opus 4.1 |
Input/output tokens$15.00 per 1M input tokens $75.00 per 1M output tokens Prompt caching$18.75 per 1M cache creation 5m input tokens $30.00 per 1M cache creation 1h input tokens $1.50 per 1M cache read input tokens
|
| Claude Opus 4 |
Input/output tokens$15.00 per 1M input tokens $75.00 per 1M output tokens Prompt caching$18.75 per 1M cache creation 5m input tokens $30.00 per 1M cache creation 1h input tokens $1.50 per 1M cache read input tokens
|
Arcee Models
| Model |
Serverless Inference |
| Trinity Large |
Input/output tokens$0.25 per 1M input tokens $0.90 per 1M output tokens Prompt caching$0.06 per 1M cache read input tokens
|
fal Models
| Model |
Serverless Inference |
| Fast SDXL |
$0.0011 per compute second |
| Flux Schnell |
$0.0030 per megapixel |
| Stable Audio 2.5 (Text-to-Audio) |
$0.00058 per compute second |
| Multilingual TTS v2 |
$0.10 per 1000 characters |
OpenAI Models
When using OpenAI commercial models with your own model API keys, billing is handled directly by OpenAI at the provider’s rates.
| Model |
Serverless Inference |
| gpt-oss-120b |
Input/output tokens$0.10 per 1M input tokens $0.70 per 1M output tokens
|
| gpt-oss-20b |
Input/output tokens$0.05 per 1M input tokens $0.45 per 1M output tokens
|
| GPT-5.4 |
Input/output tokens$2.50 per 1M input tokens $15.00 per 1M output tokens Prompt caching$0.25 per 1M cache read input tokens
|
| GPT-5.4 mini |
Input/output tokens$0.75 per 1M input tokens $4.50 per 1M output tokens Prompt caching$0.075 per 1M cache read input tokens
|
| GPT-5.4 nano |
Input/output tokens$0.20 per 1M input tokens $1.25 per 1M output tokens Prompt caching$0.02 per 1M cache read input tokens
|
| GPT-5.4 pro |
Input/output tokens$30.00 per 1M input tokens $180.00 per 1M output tokens
|
| GPT-5.3-Codex |
Input/output tokens$1.75 per 1M input tokens $14.00 per 1M output tokens Prompt caching$0.175 per 1M cache read input tokens
|
| GPT-5.2 |
Input/output tokens$1.75 per 1M input tokens $14.00 per 1M output tokens Prompt caching$0.175 per 1M cache read input tokens
|
| GPT-5.2 pro |
Input/output tokens$21.00 per 1M input tokens $168.00 per 1M output tokens
|
| GPT-5.1-Codex-Max |
Input/output tokens$1.25 per 1M input tokens $10.00 per 1M output tokens Prompt caching$0.125 per 1M cache read input tokens
|
| GPT-5 |
Input/output tokens$1.25 per 1M input tokens $10.00 per 1M output tokens Prompt caching$0.125 per 1M cache read input tokens
|
| GPT-5 mini |
Input/output tokens$0.25 per 1M input tokens $2.00 per 1M output tokens Prompt caching$0.025 per 1M cache read input tokens
|
| GPT-5 nano |
Input/output tokens$0.05 per 1M input tokens $0.40 per 1M output tokens Prompt caching$0.005 per 1M cache read input tokens
|
| GPT-4.1 |
Input/output tokens$2.00 per 1M input tokens $8.00 per 1M output tokens Prompt caching$0.50 per 1M cache read input tokens
|
| GPT-4o |
Input/output tokens$2.50 per 1M input tokens $10.00 per 1M output tokens Prompt caching$1.25 per 1M cache read input tokens
|
| GPT-4o mini |
Input/output tokens$0.15 per 1M input tokens $0.60 per 1M output tokens Prompt caching$0.075 per 1M cache read input tokens
|
| o1 |
Input/output tokens$15.00 per 1M input tokens $60.00 per 1M output tokens Prompt caching$7.50 per 1M cache read input tokens
|
| o3 |
Input/output tokens$2.00 per 1M input tokens $8.00 per 1M output tokens Prompt caching$0.50 per 1M cache read input tokens
|
| o3-mini |
Input/output tokens$1.10 per 1M input tokens $4.40 per 1M output tokens Prompt caching$0.55 per 1M cache read input tokens
|
| GPT-image-1 |
Input/output tokens$5.00 per 1M input tokens $40.00 per 1M output tokens Prompt caching$1.25 per 1M cache read input tokens
|
| GPT Image 1.5 |
Input/output tokens$5.00 per 1M input tokens $10.00 per 1M output tokens Prompt caching$1.00 per 1M cache read input tokens
|
DigitalOcean-Hosted Models
| Provider |
Model |
Serverless Inference |
| Alibaba |
Qwen3-32B |
Input/output tokens$0.25 per 1M input tokens $0.55 per 1M output tokens
|
| DeepSeek |
DeepSeek R1 Distill Llama 70B |
Input/output tokens$0.99 per 1M input tokens $0.99 per 1M output tokens
|
| MiniMax |
MiniMax M2.5 (Public Preview) |
Input/output tokens$0.30 per 1M input tokens $1.20 per 1M output tokens
|
| Moonshot AI |
Kimi K2.5 |
Input/output tokens$0.50 per 1M input tokens $2.70 per 1M output tokens
|
| Meta |
Llama 3.3 Instruct-70B |
Input/output tokens$0.65 per 1M input tokens $0.65 per 1M output tokens
|
| NVIDIA |
Nemotron-3-Super-120B (Public Preview) |
Input/output tokens$0.30 per 1M input tokens $0.65 per 1M output tokens
|
| Z.ai |
GLM 5 |
Input/output tokens$1.00 per 1M input tokens $3.20 per 1M output tokens
|
Dedicated Inference
Dedicated Inference is available in public preview and enabled for all users. You can contact support for questions or assistance.
Dedicated Inference is billed per GPU-hour of uptime for the GPU you run your model(s) on.