For AI agents: The documentation index is at https://docs.digitalocean.com/llms.txt. Markdown versions of pages use the same URL with index.html.md in place of the HTML page (for example, append index.html.md to the directory path instead of opening the HTML document).
Inference itself has no cost. Costs are incurred only when you run inference or deploy models.
Model Playground
Usage is charged at the same rate as serverless inference.
Serverless Inference
Serverless inference is billed by DigitalOcean for both open-source and commercial models. Prices align with each provider’s published rates for transparency.
Based on your tier, you have an allocated amount of usage before we charge you. For example, $25 for tier 1. Once you’ve hit that limit, we charge you for that usage. Additional inference usage is capped until you pay that bill.
The following shows pricing for foundation models available through serverless inference.
Anthropic Models
When using Anthropic commercial models with your own model API keys, billing is handled directly by Anthropic at the provider’s rates.
Claude Sonnet 4.6, Sonnet 4.5, and Sonnet 4 support an input context window of up to 1M tokens.
| Model |
Serverless Inference |
| Claude Sonnet 4.6 |
Prompts ≤200K tokens$3.00 per 1M input tokens $15.00 per 1M output tokens Prompts >200K tokens$6.00 per 1M input tokens $22.50 per 1M output tokens Prompt caching$3.75 per 1M cache creation 5m input tokens $6.00 per 1M cache creation 1h input tokens $0.30 per 1M cache read input tokens
|
| Claude Sonnet 4.5 |
Prompts ≤200K tokens$3.00 per 1M input tokens $15.00 per 1M output tokens Prompts >200K tokens$6.00 per 1M input tokens $22.50 per 1M output tokens Prompt caching$3.75 per 1M cache creation 5m input tokens $6.00 per 1M cache creation 1h input tokens $0.30 per 1M cache read input tokens
|
| Claude Sonnet 4 |
Prompts ≤200K tokens$3.00 per 1M input tokens $15.00 per 1M output tokens Prompts >200K tokens$6.00 per 1M input tokens $22.50 per 1M output tokens Prompt caching$3.75 per 1M cache creation 5m input tokens $6.00 per 1M cache creation 1h input tokens $0.30 per 1M cache read input tokens
|
| Claude Haiku 4.5 |
Input/output tokens$1.00 per 1M input tokens $5.00 per 1M output tokens Prompt caching$1.25 per 1M cache creation 5m input tokens $2.00 per 1M cache creation 1h input tokens $1.00 per 1M cache read input tokens
|
| Claude Opus 4.7 |
Input/output tokens$5.00 per 1M input tokens $25.00 per 1M output tokens Prompt caching$6.25 per 1M cache creation 5m input tokens $10.00 per 1M cache creation 1h input tokens $0.50 per 1M cache read input tokens
|
| Claude Opus 4.6 |
Prompts ≤200K tokens$5.00 per 1M input tokens $25.00 per 1M output tokens Prompts >200K tokens$10.00 per 1M input tokens $37.50 per 1M output tokens Prompt caching$6.25 per 1M cache creation 5m input tokens $10.00 per 1M cache creation 1h input tokens $0.50 per 1M cache read input tokens
|
| Claude Opus 4.5 |
Input/output tokens$5.00 per 1M input tokens $25.00 per 1M output tokens Prompt caching$6.25 per 1M cache creation 5m input tokens $10.00 per 1M cache creation 1h input tokens $0.50 per 1M cache read input tokens
|
| Claude Opus 4.1 |
Input/output tokens$15.00 per 1M input tokens $75.00 per 1M output tokens Prompt caching$18.75 per 1M cache creation 5m input tokens $30.00 per 1M cache creation 1h input tokens $1.50 per 1M cache read input tokens
|
| Claude Opus 4 |
Input/output tokens$15.00 per 1M input tokens $75.00 per 1M output tokens Prompt caching$18.75 per 1M cache creation 5m input tokens $30.00 per 1M cache creation 1h input tokens $1.50 per 1M cache read input tokens
|
Arcee Models
| Model |
Serverless Inference |
| Trinity Large |
Input/output tokens$0.25 per 1M input tokens $0.90 per 1M output tokens Prompt caching$0.06 per 1M cache read input tokens
|
fal Models
| Model |
Serverless Inference |
| Fast SDXL |
$0.0011 per compute second |
| Flux Schnell |
$0.0030 per megapixel |
| Stable Audio 2.5 (Text-to-Audio) |
$0.00058 per compute second |
| Multilingual TTS v2 |
$0.10 per 1000 characters |
OpenAI Models
When using OpenAI commercial models with your own model API keys, billing is handled directly by OpenAI at the provider’s rates.
| Model |
Serverless Inference |
| gpt-oss-120b |
Input/output tokens$0.10 per 1M input tokens $0.70 per 1M output tokens
|
| gpt-oss-20b |
Input/output tokens$0.05 per 1M input tokens $0.45 per 1M output tokens
|
| GPT-5.4 |
Input/output tokens$2.50 per 1M input tokens $15.00 per 1M output tokens Prompt caching$0.25 per 1M cache read input tokens
|
| GPT-5.4 mini |
Input/output tokens$0.75 per 1M input tokens $4.50 per 1M output tokens Prompt caching$0.075 per 1M cache read input tokens
|
| GPT-5.4 nano |
Input/output tokens$0.20 per 1M input tokens $1.25 per 1M output tokens Prompt caching$0.02 per 1M cache read input tokens
|
| GPT-5.4 pro |
Input/output tokens$30.00 per 1M input tokens $180.00 per 1M output tokens
|
| GPT-5.3-Codex |
Input/output tokens$1.75 per 1M input tokens $14.00 per 1M output tokens Prompt caching$0.175 per 1M cache read input tokens
|
| GPT-5.2 |
Input/output tokens$1.75 per 1M input tokens $14.00 per 1M output tokens Prompt caching$0.175 per 1M cache read input tokens
|
| GPT-5.2 pro |
Input/output tokens$21.00 per 1M input tokens $168.00 per 1M output tokens
|
| GPT-5.1-Codex-Max |
Input/output tokens$1.25 per 1M input tokens $10.00 per 1M output tokens Prompt caching$0.125 per 1M cache read input tokens
|
| GPT-5 |
Input/output tokens$1.25 per 1M input tokens $10.00 per 1M output tokens Prompt caching$0.125 per 1M cache read input tokens
|
| GPT-5 mini |
Input/output tokens$0.25 per 1M input tokens $2.00 per 1M output tokens Prompt caching$0.025 per 1M cache read input tokens
|
| GPT-5 nano |
Input/output tokens$0.05 per 1M input tokens $0.40 per 1M output tokens Prompt caching$0.005 per 1M cache read input tokens
|
| GPT-4.1 |
Input/output tokens$2.00 per 1M input tokens $8.00 per 1M output tokens Prompt caching$0.50 per 1M cache read input tokens
|
| GPT-4o |
Input/output tokens$2.50 per 1M input tokens $10.00 per 1M output tokens Prompt caching$1.25 per 1M cache read input tokens
|
| GPT-4o mini |
Input/output tokens$0.15 per 1M input tokens $0.60 per 1M output tokens Prompt caching$0.075 per 1M cache read input tokens
|
| o1 |
Input/output tokens$15.00 per 1M input tokens $60.00 per 1M output tokens Prompt caching$7.50 per 1M cache read input tokens
|
| o3 |
Input/output tokens$2.00 per 1M input tokens $8.00 per 1M output tokens Prompt caching$0.50 per 1M cache read input tokens
|
| o3-mini |
Input/output tokens$1.10 per 1M input tokens $4.40 per 1M output tokens Prompt caching$0.55 per 1M cache read input tokens
|
| GPT-image-1 |
Input/output tokens$5.00 per 1M input tokens $40.00 per 1M output tokens Prompt caching$1.25 per 1M cache read input tokens
|
| GPT Image 1.5 |
Input/output tokens$5.00 per 1M input tokens $10.00 per 1M output tokens Prompt caching$1.00 per 1M cache read input tokens
|
| GPT Image 2 |
Text input$5.00 per 1M tokens Text output$0.00 per 1M tokens Text cache read$1.25 per 1M tokens Image input$8.00 per 1M tokens Image output$30.00 per 1M tokens Image cache read$2.00 per 1M tokens
|
DigitalOcean-Hosted Models
The following models are discounted 30% during off-peak hours, 05:00 to 11:00 UTC each day:
See the (off-peak) rows in the following table for off-peak rates.
| Provider |
Model |
Serverless Inference |
| Alibaba |
Qwen3-32B |
Input/output tokens$0.25 per 1M input tokens $0.55 per 1M output tokens
|
| DeepSeek |
DeepSeek R1 Distill Llama 70B |
Input/output tokens$0.99 per 1M input tokens $0.99 per 1M output tokens
|
| MiniMax |
MiniMax M2.5 (Public Preview) |
Input/output tokens$0.30 per 1M input tokens $1.20 per 1M output tokens Input/output tokens (off-peak)$0.21 per 1M input tokens $0.84 per 1M output tokens
|
| Moonshot AI |
Kimi K2.5 |
Input/output tokens$0.50 per 1M input tokens $2.70 per 1M output tokens Input/output tokens (off-peak)$0.35 per 1M input tokens $1.89 per 1M output tokens
|
| Meta |
Llama 3.3 Instruct-70B |
Input/output tokens$0.65 per 1M input tokens $0.65 per 1M output tokens
|
| NVIDIA |
Nemotron-3-Super-120B (Public Preview) |
Input/output tokens$0.30 per 1M input tokens $0.65 per 1M output tokens
|
| Z.ai |
GLM 5 |
Input/output tokens$1.00 per 1M input tokens $3.20 per 1M output tokens
|
Dedicated Inference
Dedicated Inference is billed per GPU-hour based on the GPU you use.
| GPU |
Price |
| AMD MI300X |
$2.59 per hour |
| AMD MI300X (8x) |
$20.70 per hour |
| AMD MI325X |
$2.98 per hour |
| AMD MI325X (8x) |
$23.82 per hour |
| AMD MI350X |
$6.89 per hour |
| NVIDIA B300 |
$10.39 per hour |
| NVIDIA B300 (8x) |
$83.10 per hour |
| NVIDIA H100 |
$4.41 per hour |
| NVIDIA H100 (8x) |
$30.32 per hour |
| NVIDIA H200 |
$4.47 per hour |
| NVIDIA H200 (8x) |
$35.78 per hour |
|
|
Batch Inference
Batch inference is charged at upto 50% discount on OpenAI and Anthropic models.
You are only charged for completed requests. If a batch job fails, is blocked by guardrails, or expires partway through, requests that were not processed are not charged.
Web Search Requests
You are charged $10 per 1000 requests for using web search with serverless inference.
Model Evaluations
Model evaluations for candidate models deployed on Serverless Inference, and for judge models, are charged at the same token rates as serverless inference.