Inference Pricing

Validated on 27 Apr 2026 • Last edited on 27 Apr 2026

Inference provides a single control plane for managing inference workflows. It includes a Model Catalog where you can view available foundation models, including both DigitalOcean-hosted and third-party commercial models, compare model capabilities and pricing, use routing to match inference requests to the best-fit model, and run inference using serverless or dedicated deployments.

Inference itself has no cost. Costs are incurred only when you run inference or deploy models.

Model Playground

Usage is charged at the same rate as serverless inference.

Serverless Inference

Serverless inference is billed by DigitalOcean for both open-source and commercial models. Prices align with each provider’s published rates for transparency.

Warning
Based on your tier, you have an allocated amount of usage before we charge you. For example, $25 for tier 1. Once you’ve hit that limit, we charge you for that usage. Additional inference usage is capped until you pay that bill.

The following shows pricing for foundation models available through serverless inference.

Anthropic Models
Note
When using Anthropic commercial models with your own model API keys, billing is handled directly by Anthropic at the provider’s rates.

Claude Sonnet 4.6, Sonnet 4.5, and Sonnet 4 support an input context window of up to 1M tokens.

Model Serverless Inference
Claude Sonnet 4.6 Prompts ≤200K tokens$3.00 per 1M input tokens
$15.00 per 1M output tokens
Prompts >200K tokens$6.00 per 1M input tokens
$22.50 per 1M output tokens
Prompt caching$3.75 per 1M cache creation 5m input tokens
$6.00 per 1M cache creation 1h input tokens
$0.30 per 1M cache read input tokens
Claude Sonnet 4.5 Prompts ≤200K tokens$3.00 per 1M input tokens
$15.00 per 1M output tokens
Prompts >200K tokens$6.00 per 1M input tokens
$22.50 per 1M output tokens
Prompt caching$3.75 per 1M cache creation 5m input tokens
$6.00 per 1M cache creation 1h input tokens
$0.30 per 1M cache read input tokens
Claude Sonnet 4 Prompts ≤200K tokens$3.00 per 1M input tokens
$15.00 per 1M output tokens
Prompts >200K tokens$6.00 per 1M input tokens
$22.50 per 1M output tokens
Prompt caching$3.75 per 1M cache creation 5m input tokens
$6.00 per 1M cache creation 1h input tokens
$0.30 per 1M cache read input tokens
Claude Haiku 4.5 Input/output tokens$1.00 per 1M input tokens
$5.00 per 1M output tokens
Prompt caching$1.25 per 1M cache creation 5m input tokens
$2.00 per 1M cache creation 1h input tokens
$1.00 per 1M cache read input tokens
Claude Opus 4.7 Input/output tokens$5.00 per 1M input tokens
$25.00 per 1M output tokens
Prompt caching$6.25 per 1M cache creation 5m input tokens
$10.00 per 1M cache creation 1h input tokens
$0.50 per 1M cache read input tokens
Claude Opus 4.6 Prompts ≤200K tokens$5.00 per 1M input tokens
$25.00 per 1M output tokens
Prompts >200K tokens$10.00 per 1M input tokens
$37.50 per 1M output tokens
Prompt caching$6.25 per 1M cache creation 5m input tokens
$10.00 per 1M cache creation 1h input tokens
$0.50 per 1M cache read input tokens
Claude Opus 4.5 Input/output tokens$5.00 per 1M input tokens
$25.00 per 1M output tokens
Prompt caching$6.25 per 1M cache creation 5m input tokens
$10.00 per 1M cache creation 1h input tokens
$0.50 per 1M cache read input tokens
Claude Opus 4.1 Input/output tokens$15.00 per 1M input tokens
$75.00 per 1M output tokens
Prompt caching$18.75 per 1M cache creation 5m input tokens
$30.00 per 1M cache creation 1h input tokens
$1.50 per 1M cache read input tokens
Claude Opus 4 Input/output tokens$15.00 per 1M input tokens
$75.00 per 1M output tokens
Prompt caching$18.75 per 1M cache creation 5m input tokens
$30.00 per 1M cache creation 1h input tokens
$1.50 per 1M cache read input tokens
Arcee Models
Model Serverless Inference
Trinity Large Input/output tokens$0.25 per 1M input tokens
$0.90 per 1M output tokens
Prompt caching$0.06 per 1M cache read input tokens
fal Models
Model Serverless Inference
Fast SDXL $0.0011 per compute second
Flux Schnell $0.0030 per megapixel
Stable Audio 2.5 (Text-to-Audio) $0.00058 per compute second
Multilingual TTS v2 $0.10 per 1000 characters
OpenAI Models
Note
When using OpenAI commercial models with your own model API keys, billing is handled directly by OpenAI at the provider’s rates.
Model Serverless Inference
gpt-oss-120b Input/output tokens$0.10 per 1M input tokens
$0.70 per 1M output tokens
gpt-oss-20b Input/output tokens$0.05 per 1M input tokens
$0.45 per 1M output tokens
GPT-5.4 Input/output tokens$2.50 per 1M input tokens
$15.00 per 1M output tokens
Prompt caching$0.25 per 1M cache read input tokens
GPT-5.4 mini Input/output tokens$0.75 per 1M input tokens
$4.50 per 1M output tokens
Prompt caching$0.075 per 1M cache read input tokens
GPT-5.4 nano Input/output tokens$0.20 per 1M input tokens
$1.25 per 1M output tokens
Prompt caching$0.02 per 1M cache read input tokens
GPT-5.4 pro Input/output tokens$30.00 per 1M input tokens
$180.00 per 1M output tokens
GPT-5.3-Codex Input/output tokens$1.75 per 1M input tokens
$14.00 per 1M output tokens
Prompt caching$0.175 per 1M cache read input tokens
GPT-5.2 Input/output tokens$1.75 per 1M input tokens
$14.00 per 1M output tokens
Prompt caching$0.175 per 1M cache read input tokens
GPT-5.2 pro Input/output tokens$21.00 per 1M input tokens
$168.00 per 1M output tokens
GPT-5.1-Codex-Max Input/output tokens$1.25 per 1M input tokens
$10.00 per 1M output tokens
Prompt caching$0.125 per 1M cache read input tokens
GPT-5 Input/output tokens$1.25 per 1M input tokens
$10.00 per 1M output tokens
Prompt caching$0.125 per 1M cache read input tokens
GPT-5 mini Input/output tokens$0.25 per 1M input tokens
$2.00 per 1M output tokens
Prompt caching$0.025 per 1M cache read input tokens
GPT-5 nano Input/output tokens$0.05 per 1M input tokens
$0.40 per 1M output tokens
Prompt caching$0.005 per 1M cache read input tokens
GPT-4.1 Input/output tokens$2.00 per 1M input tokens
$8.00 per 1M output tokens
Prompt caching$0.50 per 1M cache read input tokens
GPT-4o Input/output tokens$2.50 per 1M input tokens
$10.00 per 1M output tokens
Prompt caching$1.25 per 1M cache read input tokens
GPT-4o mini Input/output tokens$0.15 per 1M input tokens
$0.60 per 1M output tokens
Prompt caching$0.075 per 1M cache read input tokens
o1 Input/output tokens$15.00 per 1M input tokens
$60.00 per 1M output tokens
Prompt caching$7.50 per 1M cache read input tokens
o3 Input/output tokens$2.00 per 1M input tokens
$8.00 per 1M output tokens
Prompt caching$0.50 per 1M cache read input tokens
o3-mini Input/output tokens$1.10 per 1M input tokens
$4.40 per 1M output tokens
Prompt caching$0.55 per 1M cache read input tokens
GPT-image-1 Input/output tokens$5.00 per 1M input tokens
$40.00 per 1M output tokens
Prompt caching$1.25 per 1M cache read input tokens
GPT Image 1.5 Input/output tokens$5.00 per 1M input tokens
$10.00 per 1M output tokens
Prompt caching$1.00 per 1M cache read input tokens
GPT Image 2 Text input$5.00 per 1M tokens
Text output$0.00 per 1M tokens
Text cache read$1.25 per 1M tokens
Image input$8.00 per 1M tokens
Image output$30.00 per 1M tokens
Image cache read$2.00 per 1M tokens
DigitalOcean-Hosted Models

The following models are discounted 30% during off-peak hours, 05:00 to 11:00 UTC each day:

  • Kimi K2.5
  • Minimax M2.5

See the (off-peak) rows in the following table for off-peak rates.

Provider Model Serverless Inference
Alibaba Qwen3-32B Input/output tokens$0.25 per 1M input tokens
$0.55 per 1M output tokens
DeepSeek DeepSeek R1 Distill Llama 70B Input/output tokens$0.99 per 1M input tokens
$0.99 per 1M output tokens
MiniMax MiniMax M2.5 (Public Preview) Input/output tokens$0.30 per 1M input tokens
$1.20 per 1M output tokens
Input/output tokens
(off-peak)
$0.21 per 1M input tokens
$0.84 per 1M output tokens
Moonshot AI Kimi K2.5 Input/output tokens$0.50 per 1M input tokens
$2.70 per 1M output tokens
Input/output tokens
(off-peak)
$0.35 per 1M input tokens
$1.89 per 1M output tokens
Meta Llama 3.3 Instruct-70B Input/output tokens$0.65 per 1M input tokens
$0.65 per 1M output tokens
NVIDIA Nemotron-3-Super-120B (Public Preview) Input/output tokens$0.30 per 1M input tokens
$0.65 per 1M output tokens
Z.ai GLM 5 Input/output tokens$1.00 per 1M input tokens
$3.20 per 1M output tokens

Dedicated Inference

Dedicated Inference is billed per GPU-hour based on the GPU you use.

GPU Price
AMD MI300X $2.59 per hour
AMD MI300X (8x) $20.70 per hour
AMD MI325X $2.98 per hour
AMD MI325X (8x) $23.82 per hour
AMD MI350X $6.89 per hour
NVIDIA B300 $10.39 per hour
NVIDIA B300 (8x) $83.10 per hour
NVIDIA H100 $4.41 per hour
NVIDIA H100 (8x) $30.32 per hour
NVIDIA H200 $4.47 per hour
NVIDIA H200 (8x) $35.78 per hour

Batch Inference

Batch inference is charged at upto 50% discount on OpenAI and Anthropic models.

You are only charged for completed requests. If a batch job fails, is blocked by guardrails, or expires partway through, requests that were not processed are not charged.

Web Search Requests

You are charged $10 per 1000 requests for using web search with serverless inference.

Model Evaluations

Model evaluations for candidate models deployed on Serverless Inference, and for judge models, are charged at the same token rates as serverless inference.

We can't find any results for your search.

Try using different keywords or simplifying your search terms.