# DigitalOcean Gradient™ AI Inference Hub Pricing

DigitalOcean Gradient™ AI Inference Hub provides a single control plane for managing inference workflows. It includes a Model Catalog where you can view available foundation models, including both DigitalOcean-hosted and third-party commercial models, compare capabilities and pricing, and run inference using serverless or dedicated deployments. DigitalOcean Gradient AI Inference Hub is in [public preview](https://docs.digitalocean.com/platform/product-lifecycle/index.html.md#public-preview) and enabled for all users. You can [contact support](https://cloudsupport.digitalocean.com) for questions or assistance.

Inference Hub itself has no cost. The Model Catalog API is free to use for browsing supported models and reviewing pricing and capabilities. Costs are incurred only when you run inference or deploy models.

## Serverless Inference

Serverless inference is billed by DigitalOcean for both open-source and commercial models. Prices align with each provider’s published rates for transparency.

The following shows pricing for foundation models available through serverless inference in Inference Hub.

## Alibaba

| Model | Serverless Inference |
|---|---|
| [Qwen3-32B](https://huggingface.co/Qwen/Qwen3-32B) | $0.25 per 1M input tokens , $0.55 per 1M output tokens |

## Anthropic

**Note**: When using Anthropic commercial models with your own model API keys, billing is handled directly by Anthropic at the provider’s rates.

Claude Sonnet 4.6 (in Beta), Sonnet 4.5, and Sonnet 4 support an input context window of up to 1M tokens.

| Model | Serverless Inference |
|---|---|
| [Claude 3.5 Sonnet](https://www.anthropic.com/news/claude-3-5-sonnet) , **DEPRECATED** | $3.00 per 1M input tokens , $15.00 per 1M output tokens |
| [Claude Haiku 4.5](https://www.anthropic.com/claude/haiku) | $1.00 per 1M input tokens , $5.00 per 1M output tokens |
| [Claude Opus 4.6](https://www.anthropic.com/claude/opus) | For prompts less than or equal to 200K tokens:,   - $5.00 per 1M input tokens ,   - $25.00 per 1M output tokens, , For prompts greater than 200K tokens:,   - $10.00 per 1M input tokens ,   - $37.50 per 1M output tokens |
| [Claude Opus 4.5](https://www.anthropic.com/claude/opus) | $5.00 per 1M input tokens , $25.00 per 1M output tokens |
| [Claude Opus 4.1](https://www.anthropic.com/claude/opus) | $15.00 per 1M input tokens , $75.00 per 1M output tokens |

## DeepSeek

| Model | Serverless Inference |
|---|---|
| [DeepSeek R1 Distill Llama 70B](https://huggingface.co/deepseek-ai/DeepSeek-R1-Distill-Llama-70B) | $0.99 per 1M input tokens, $0.99 per 1M output tokens |

## fal

| Model | Serverless Inference |
|---|---|
| Fast SDXL | $0.0011 per compute second |
| Flux Schnell | $0.0030 per megapixel |
| Stable Audio 2.5 (Text-to-Audio) | $0.00058 per compute second |
| Multilingual TTS v2 | $0.10 per 1000 characters |

## Meta

| Model | Serverless Inference |
|---|---|
| [Llama 3.3 Instruct-70B](https://huggingface.co/meta-llama/Llama-3.3-70B-Instruct) | $0.65 per 1M input tokens, $0.65 per 1M output tokens |
| [Llama 3.1 Instruct-8B](https://huggingface.co/meta-llama/Llama-3.1-8B-Instruct) | $0.198 per 1M input tokens, $0.198 per 1M output tokens |

## Mistral

| Model | Serverless Inference |
|---|---|
| [NeMo](https://mistral.ai/news/mistral-nemo/) | $0.30 per 1M input tokens, $0.30 per 1M output tokens |

## OpenAI

**Note**: When using OpenAI commercial models with your own model API keys, billing is handled directly by OpenAI at the provider’s rates.

| Model | Serverless Inference |
|---|---|
| [gpt-oss-120b](https://platform.openai.com/docs/models/gpt-oss-120b) | $0.10 per 1M input tokens, $0.70 per 1M output tokens |
| [gpt-oss-20b](https://platform.openai.com/docs/models/gpt-oss-20b) | $0.05 per 1M input tokens, $0.45 per 1M output tokens |
| [GPT-5.2](https://platform.openai.com/docs/models/gpt-5.2) | $1.75 per 1M input tokens, $14.00 per 1M output tokens |
| [GPT-5.2 pro](https://platform.openai.com/docs/models/gpt-5.2-pro) | $21.00 per 1M input tokens, $168.00 per 1M output tokens |
| [GPT-5.1-Codex-Max](https://platform.openai.com/docs/models/gpt-5.1-codex-max) | $1.25 per 1M input tokens, $10.00 per 1M output tokens |
| [GPT-5](https://platform.openai.com/docs/models/gpt-5) | $1.25 per 1M input tokens, $10.00 per 1M output tokens |
| [GPT-5 mini](https://platform.openai.com/docs/models/gpt-5-mini) | $0.25 per 1M input tokens, $2.00 per 1M output tokens |
| [GPT-5 nano](https://platform.openai.com/docs/models/gpt-5-nano) | $0.05 per 1M input tokens, $0.40 per 1M output tokens |
| [GPT-4.1](https://platform.openai.com/docs/models/gpt-4.1) | $2.00 per 1M input tokens, $8.00 per 1M output tokens |
| [GPT-4o](https://platform.openai.com/docs/models/gpt-4o) | $2.50 per 1M input tokens, $10.00 per 1M output tokens |
| [GPT-4o mini](https://platform.openai.com/docs/models/gpt-4o-mini) | $0.15 per 1M input tokens, $0.60 per 1M output tokens |
| [o1](https://platform.openai.com/docs/models/o1) | $15.00 per 1M input tokens, $60.00 per 1M output tokens |
| [o3](https://platform.openai.com/docs/models/o3) | $2.00 per 1M input tokens, $8.00 per 1M output tokens |
| [o3-mini](https://platform.openai.com/docs/models/o3-mini) | $1.10 per 1M input tokens, $4.40 per 1M output tokens |
| [GPT-image-1](https://platform.openai.com/docs/models/gpt-image-1) | $5.00 per 1M input tokens, $40.00 per 1M output tokens |

## Dedicated Inference (public)

Dedicated Inference is available in [public preview](https://docs.digitalocean.com/platform/product-lifecycle/index.html.md#public-preview) and enabled for all users. You can [contact support](https://cloudsupport.digitalocean.com) for questions or assistance.

Dedicated Inference is [billed per GPU-hour of uptime](https://docs.digitalocean.com/products/droplets/details/pricing/index.html.md#gpu-droplet-pricing) for the GPU you run your model(s) on.