DigitalOcean Gradient™ AI Inference Hub Pricing

Validated on 26 Jun 2018 • Last edited on 16 Mar 2026

DigitalOcean Gradient™ AI Inference Hub provides a single control plane for managing inference workflows. It includes a Model Catalog where you can view available foundation models, including both DigitalOcean-hosted and third-party commercial models, compare capabilities and pricing, and run inference using serverless or dedicated deployments. DigitalOcean Gradient AI Inference Hub is in public preview and enabled for all users. You can contact support for questions or assistance.

Inference Hub itself has no cost. The Model Catalog API is free to use for browsing supported models and reviewing pricing and capabilities. Costs are incurred only when you run inference or deploy models.

Serverless Inference

Serverless inference is billed by DigitalOcean for both open-source and commercial models. Prices align with each provider’s published rates for transparency.

The following shows pricing for foundation models available through serverless inference in Inference Hub.

Model Serverless Inference
Qwen3-32B $0.25 per 1M input tokens
$0.55 per 1M output tokens
Note
When using Anthropic commercial models with your own model API keys, billing is handled directly by Anthropic at the provider’s rates.

Claude Sonnet 4.6 (in Beta), Sonnet 4.5, and Sonnet 4 support an input context window of up to 1M tokens.

Model Serverless Inference
Claude 3.5 Sonnet
DEPRECATED
$3.00 per 1M input tokens
$15.00 per 1M output tokens
Claude Haiku 4.5 $1.00 per 1M input tokens
$5.00 per 1M output tokens
Claude Opus 4.6 For prompts less than or equal to 200K tokens:
  - $5.00 per 1M input tokens
  - $25.00 per 1M output tokens

For prompts greater than 200K tokens:
  - $10.00 per 1M input tokens
  - $37.50 per 1M output tokens
Claude Opus 4.5 $5.00 per 1M input tokens
$25.00 per 1M output tokens
Claude Opus 4.1 $15.00 per 1M input tokens
$75.00 per 1M output tokens
Model Serverless Inference
DeepSeek R1 Distill Llama 70B $0.99 per 1M input tokens
$0.99 per 1M output tokens
Model Serverless Inference
Fast SDXL $0.0011 per compute second
Flux Schnell $0.0030 per megapixel
Stable Audio 2.5 (Text-to-Audio) $0.00058 per compute second
Multilingual TTS v2 $0.10 per 1000 characters
Model Serverless Inference
Llama 3.3 Instruct-70B $0.65 per 1M input tokens
$0.65 per 1M output tokens
Llama 3.1 Instruct-8B $0.198 per 1M input tokens
$0.198 per 1M output tokens
Model Serverless Inference
NeMo $0.30 per 1M input tokens
$0.30 per 1M output tokens
Note
When using OpenAI commercial models with your own model API keys, billing is handled directly by OpenAI at the provider’s rates.
Model Serverless Inference
gpt-oss-120b $0.10 per 1M input tokens
$0.70 per 1M output tokens
gpt-oss-20b $0.05 per 1M input tokens
$0.45 per 1M output tokens
GPT-5.2 $1.75 per 1M input tokens
$14.00 per 1M output tokens
GPT-5.2 pro $21.00 per 1M input tokens
$168.00 per 1M output tokens
GPT-5.1-Codex-Max $1.25 per 1M input tokens
$10.00 per 1M output tokens
GPT-5 $1.25 per 1M input tokens
$10.00 per 1M output tokens
GPT-5 mini $0.25 per 1M input tokens
$2.00 per 1M output tokens
GPT-5 nano $0.05 per 1M input tokens
$0.40 per 1M output tokens
GPT-4.1 $2.00 per 1M input tokens
$8.00 per 1M output tokens
GPT-4o $2.50 per 1M input tokens
$10.00 per 1M output tokens
GPT-4o mini $0.15 per 1M input tokens
$0.60 per 1M output tokens
o1 $15.00 per 1M input tokens
$60.00 per 1M output tokens
o3 $2.00 per 1M input tokens
$8.00 per 1M output tokens
o3-mini $1.10 per 1M input tokens
$4.40 per 1M output tokens
GPT-image-1 $5.00 per 1M input tokens
$40.00 per 1M output tokens

Dedicated Inference public

Dedicated Inference is available in public preview and enabled for all users. You can contact support for questions or assistance.

Dedicated Inference is billed per GPU-hour of uptime for the GPU you run your model(s) on.

We can't find any results for your search.

Try using different keywords or simplifying your search terms.