DigitalOcean Gradient™ AI Inference Hub Pricing

Validated on 16 Apr 2026 • Last edited on 16 Apr 2026

DigitalOcean Gradient™ AI Inference Hub provides a single control plane for managing inference workflows. It includes a Model Catalog where you can view available foundation models, including both DigitalOcean-hosted and third-party commercial models, compare capabilities and pricing, and run inference using serverless or dedicated deployments. DigitalOcean Gradient AI Inference Hub is in private preview. You can contact support for questions or assistance.

Inference Hub itself has no cost. The Model Catalog API is free to use for browsing supported models and reviewing pricing and capabilities. Costs are incurred only when you run inference or deploy models.

Serverless Inference

Serverless inference is billed by DigitalOcean for both open-source and commercial models. Prices align with each provider’s published rates for transparency.

The following shows pricing for foundation models available through serverless inference in Inference Hub.

Anthropic Models
Note
When using Anthropic commercial models with your own model API keys, billing is handled directly by Anthropic at the provider’s rates.

Claude Sonnet 4.6, Sonnet 4.5, and Sonnet 4 support an input context window of up to 1M tokens.

Model Serverless Inference
Claude Sonnet 4.6 Prompts ≤200K tokens$3.00 per 1M input tokens
$15.00 per 1M output tokens
Prompts >200K tokens$6.00 per 1M input tokens
$22.50 per 1M output tokens
Prompt caching$3.75 per 1M cache creation 5m input tokens
$6.00 per 1M cache creation 1h input tokens
$0.30 per 1M cache read input tokens
Claude Sonnet 4.5 Prompts ≤200K tokens$3.00 per 1M input tokens
$15.00 per 1M output tokens
Prompts >200K tokens$6.00 per 1M input tokens
$22.50 per 1M output tokens
Prompt caching$3.75 per 1M cache creation 5m input tokens
$6.00 per 1M cache creation 1h input tokens
$0.30 per 1M cache read input tokens
Claude Sonnet 4 Prompts ≤200K tokens$3.00 per 1M input tokens
$15.00 per 1M output tokens
Prompts >200K tokens$6.00 per 1M input tokens
$22.50 per 1M output tokens
Prompt caching$3.75 per 1M cache creation 5m input tokens
$6.00 per 1M cache creation 1h input tokens
$0.30 per 1M cache read input tokens
Claude Haiku 4.5 Input/output tokens$1.00 per 1M input tokens
$5.00 per 1M output tokens
Prompt caching$1.25 per 1M cache creation 5m input tokens
$2.00 per 1M cache creation 1h input tokens
$1.00 per 1M cache read input tokens
Claude Opus 4.7 Input/output tokens$5.00 per 1M input tokens
$25.00 per 1M output tokens
Prompt caching$6.25 per 1M cache creation 5m input tokens
$10.00 per 1M cache creation 1h input tokens
$0.50 per 1M cache read input tokens
Claude Opus 4.6 Prompts ≤200K tokens$5.00 per 1M input tokens
$25.00 per 1M output tokens
Prompts >200K tokens$10.00 per 1M input tokens
$37.50 per 1M output tokens
Prompt caching$6.25 per 1M cache creation 5m input tokens
$10.00 per 1M cache creation 1h input tokens
$0.50 per 1M cache read input tokens
Claude Opus 4.5 Input/output tokens$5.00 per 1M input tokens
$25.00 per 1M output tokens
Prompt caching$6.25 per 1M cache creation 5m input tokens
$10.00 per 1M cache creation 1h input tokens
$0.50 per 1M cache read input tokens
Claude Opus 4.1 Input/output tokens$15.00 per 1M input tokens
$75.00 per 1M output tokens
Prompt caching$18.75 per 1M cache creation 5m input tokens
$30.00 per 1M cache creation 1h input tokens
$1.50 per 1M cache read input tokens
Claude Opus 4 Input/output tokens$15.00 per 1M input tokens
$75.00 per 1M output tokens
Prompt caching$18.75 per 1M cache creation 5m input tokens
$30.00 per 1M cache creation 1h input tokens
$1.50 per 1M cache read input tokens
Arcee Models
Model Serverless Inference
Trinity Large Input/output tokens$0.25 per 1M input tokens
$0.90 per 1M output tokens
Prompt caching$0.06 per 1M cache read input tokens
fal Models
Model Serverless Inference
Fast SDXL $0.0011 per compute second
Flux Schnell $0.0030 per megapixel
Stable Audio 2.5 (Text-to-Audio) $0.00058 per compute second
Multilingual TTS v2 $0.10 per 1000 characters
OpenAI Models
Note
When using OpenAI commercial models with your own model API keys, billing is handled directly by OpenAI at the provider’s rates.
Model Serverless Inference
gpt-oss-120b Input/output tokens$0.10 per 1M input tokens
$0.70 per 1M output tokens
gpt-oss-20b Input/output tokens$0.05 per 1M input tokens
$0.45 per 1M output tokens
GPT-5.4 Input/output tokens$2.50 per 1M input tokens
$15.00 per 1M output tokens
Prompt caching$0.25 per 1M cache read input tokens
GPT-5.4 mini Input/output tokens$0.75 per 1M input tokens
$4.50 per 1M output tokens
Prompt caching$0.075 per 1M cache read input tokens
GPT-5.4 nano Input/output tokens$0.20 per 1M input tokens
$1.25 per 1M output tokens
Prompt caching$0.02 per 1M cache read input tokens
GPT-5.4 pro Input/output tokens$30.00 per 1M input tokens
$180.00 per 1M output tokens
GPT-5.3-Codex Input/output tokens$1.75 per 1M input tokens
$14.00 per 1M output tokens
Prompt caching$0.175 per 1M cache read input tokens
GPT-5.2 Input/output tokens$1.75 per 1M input tokens
$14.00 per 1M output tokens
Prompt caching$0.175 per 1M cache read input tokens
GPT-5.2 pro Input/output tokens$21.00 per 1M input tokens
$168.00 per 1M output tokens
GPT-5.1-Codex-Max Input/output tokens$1.25 per 1M input tokens
$10.00 per 1M output tokens
Prompt caching$0.125 per 1M cache read input tokens
GPT-5 Input/output tokens$1.25 per 1M input tokens
$10.00 per 1M output tokens
Prompt caching$0.125 per 1M cache read input tokens
GPT-5 mini Input/output tokens$0.25 per 1M input tokens
$2.00 per 1M output tokens
Prompt caching$0.025 per 1M cache read input tokens
GPT-5 nano Input/output tokens$0.05 per 1M input tokens
$0.40 per 1M output tokens
Prompt caching$0.005 per 1M cache read input tokens
GPT-4.1 Input/output tokens$2.00 per 1M input tokens
$8.00 per 1M output tokens
Prompt caching$0.50 per 1M cache read input tokens
GPT-4o Input/output tokens$2.50 per 1M input tokens
$10.00 per 1M output tokens
Prompt caching$1.25 per 1M cache read input tokens
GPT-4o mini Input/output tokens$0.15 per 1M input tokens
$0.60 per 1M output tokens
Prompt caching$0.075 per 1M cache read input tokens
o1 Input/output tokens$15.00 per 1M input tokens
$60.00 per 1M output tokens
Prompt caching$7.50 per 1M cache read input tokens
o3 Input/output tokens$2.00 per 1M input tokens
$8.00 per 1M output tokens
Prompt caching$0.50 per 1M cache read input tokens
o3-mini Input/output tokens$1.10 per 1M input tokens
$4.40 per 1M output tokens
Prompt caching$0.55 per 1M cache read input tokens
GPT-image-1 Input/output tokens$5.00 per 1M input tokens
$40.00 per 1M output tokens
Prompt caching$1.25 per 1M cache read input tokens
GPT Image 1.5 Input/output tokens$5.00 per 1M input tokens
$10.00 per 1M output tokens
Prompt caching$1.00 per 1M cache read input tokens
DigitalOcean-Hosted Models
Provider Model Serverless Inference
Alibaba Qwen3-32B Input/output tokens$0.25 per 1M input tokens
$0.55 per 1M output tokens
DeepSeek DeepSeek R1 Distill Llama 70B Input/output tokens$0.99 per 1M input tokens
$0.99 per 1M output tokens
MiniMax MiniMax M2.5 (Public Preview) Input/output tokens$0.30 per 1M input tokens
$1.20 per 1M output tokens
Moonshot AI Kimi K2.5 Input/output tokens$0.50 per 1M input tokens
$2.70 per 1M output tokens
Meta Llama 3.3 Instruct-70B Input/output tokens$0.65 per 1M input tokens
$0.65 per 1M output tokens
NVIDIA Nemotron-3-Super-120B (Public Preview) Input/output tokens$0.30 per 1M input tokens
$0.65 per 1M output tokens
Z.ai GLM 5 Input/output tokens$1.00 per 1M input tokens
$3.20 per 1M output tokens

Dedicated Inference public

Dedicated Inference is available in public preview and enabled for all users. You can contact support for questions or assistance.

Dedicated Inference is billed per GPU-hour of uptime for the GPU you run your model(s) on.

We can't find any results for your search.

Try using different keywords or simplifying your search terms.