# DigitalOcean Gradient™ AI Platform Pricing DigitalOcean Gradient™ AI Platform lets you build fully-managed AI agents with knowledge bases for retrieval-augmented generation, multi-agent routing, guardrails, and more, or use serverless inference to make direct requests to popular foundation models. Gradient AI Platform has a usage-based pricing model, so costs scale with your actual usage. We charge for model usage for [serverless inference](https://docs.digitalocean.com/products/gradient-ai-platform/how-to/use-serverless-inference/index.html.md), [Agent Development Kit (ADK)(in public preview)](https://docs.digitalocean.com/products/gradient-ai-platform/how-to/build-agents-using-adk/index.html.md), and [agents created using the DigitalOcean Control Panel, CLI, or API](https://docs.digitalocean.com/products/gradient-ai-platform/how-to/use-agents/index.html.md), and for additional features like [knowledge bases](https://docs.digitalocean.com/products/gradient-ai-platform/how-to/create-manage-agent-knowledge-bases/index.html.md), [guardrails](https://docs.digitalocean.com/products/gradient-ai-platform/how-to/manage-agent-guardrails/index.html.md), and [log stream insights](https://docs.digitalocean.com/products/gradient-ai-platform/how-to/view-agent-observability/index.html.md). We display prices per million tokens and bill per thousand tokens for accuracy. Serverless inference is billed by DigitalOcean for both open-source and commercial models. Prices align with each provider’s published rates for transparency.If you are using a DigitalOcean hosted model through serverless inference in your agent deployment using the ADK, you are [charged for those model keys](#foundation-model-usage). [Agent creation](https://docs.digitalocean.com/products/gradient-ai-platform/how-to/create-agents/index.html.md) is free. Agent usage is billed by DigitalOcean for open-source models. You are charged for all input and output tokens processed by the agent. Token usage depends on factors such as input length, agent instructions, attached knowledge bases, and configuration settings. To optimize usage, [test your agents](https://docs.digitalocean.com/products/gradient-ai-platform/how-to/test-agents/index.html.md) and adjust their parameters. Usage for commercial models in agents or evaluations with your own provider API keys (for example, OpenAI key or Anthropic key) is billed directly by the provider. DigitalOcean does not charge you for that model usage. ## Foundation Model Usage The following shows pricing for open-source and commercial models for serverless inference, ADK, and agent usage. ## Anthropic Models **Note**: When using Anthropic commercial models with your own model API keys, billing is handled directly by Anthropic at the provider’s rates. Claude Sonnet 4.6, Sonnet 4.5, and Sonnet 4 support an input context window of up to 1M tokens. | Model | Serverless Inference and ADK | Agent Usage | |---|---|---| | [Claude Sonnet 4.6](https://www.anthropic.com/claude/sonnet) | Prompt ≤200K tokens:,   - $3.00 per 1M input tokens ,   - $15.00 per 1M output tokens, , Prompt >200K tokens: ,   - $6.00 per 1M input tokens ,   - $22.50 per 1M output tokens, , Prompt caching:, $3.75 per 1M cache creation 5m input tokens, $6.00 per 1M cache creation 1h input tokens, $0.30 per 1M cache read input tokens, | Same as serverless inference. | | [Claude Sonnet 4.5](https://www.anthropic.com/claude/sonnet) | Prompt ≤200K tokens:,   - $3.00 per 1M input tokens ,   - $15.00 per 1M output tokens, , Prompt >200K tokens: ,   - $6.00 per 1M input tokens ,   - $22.50 per 1M output tokens, , Prompt caching:, $3.75 per 1M cache creation 5m input tokens, $6.00 per 1M cache creation 1h input tokens, $0.30 per 1M cache read input tokens, | Same as serverless inference. | | [Claude Sonnet 4](https://www.anthropic.com/claude/sonnet) | Prompt ≤200K tokens:,   - $3.00 per 1M input tokens ,   - $15.00 per 1M output tokens, , Prompt >200K tokens: ,   - $6.00 per 1M input tokens ,   - $22.50 per 1M output tokens, , Prompt caching:, $3.75 per 1M cache creation 5m input tokens, $6.00 per 1M cache creation 1h input tokens, $0.30 per 1M cache read input tokens, | Same as serverless inference. | | [Claude Haiku 4.5](https://www.anthropic.com/claude/haiku) | $1.00 per 1M input tokens , $5.00 per 1M output tokens, , Prompt caching:, $1.25 per 1M cache creation 5m input tokens, $2.00 per 1M cache creation 1h input tokens, $1.00 per 1M cache read input tokens, | Same as serverless inference. | | [Claude Opus 4.6](https://www.anthropic.com/claude/opus) | Prompt ≤200K tokens:,   - $5.00 per 1M input tokens ,   - $25.00 per 1M output tokens, , Prompt >200K tokens: ,   - $10.00 per 1M input tokens ,   - $37.50 per 1M output tokens, , Prompt caching:, $6.25 per 1M cache creation 5m input tokens, $10.00 per 1M cache creation 1h input tokens, $0.50 per 1M cache read input tokens, | Same as serverless inference. | | [Claude Opus 4.5](https://www.anthropic.com/claude/opus) | $5.00 per 1M input tokens , $25.00 per 1M output tokens, , Prompt caching:, $6.25 per 1M cache creation 5m input tokens, $10.00 per 1M cache creation 1h input tokens, $0.50 per 1M cache read input tokens, | Same as serverless inference. | | [Claude Opus 4.1](https://www.anthropic.com/claude/opus) | $15.00 per 1M input tokens , $75.00 per 1M output tokens, , Prompt caching:, $18.75 per 1M cache creation 5m input tokens, $30.00 per 1M cache creation 1h input tokens, $1.50 per 1M cache read input tokens, | Same as serverless inference. | | [Claude Opus 4](https://www.anthropic.com/claude/opus) | $15.00 per 1M input tokens , $75.00 per 1M output tokens, , Prompt caching:, $18.75 per 1M cache creation 5m input tokens, $30.00 per 1M cache creation 1h input tokens, $1.50 per 1M cache read input tokens, | Same as serverless inference. | ## fal Models | Model | Serverless Inference | Agent Usage | |---|---|---| | Fast SDXL | $0.0011 per compute second | Not supported | | Flux Schnell | $0.0030 per megapixel | Not supported | | Stable Audio 2.5 (Text-to-Audio) | $0.00058 per compute second | Not supported | | Multilingual TTS v2 | $0.10 per 1000 characters | Not supported | ## OpenAI Models **Note**: When using OpenAI commercial models with your own model API keys, billing is handled directly by OpenAI at the provider’s rates. | Model | Serverless Inference and ADK | Agent Usage | |---|---|---| | [gpt-oss-120b](https://platform.openai.com/docs/models/gpt-oss-120b) | $0.10 per 1M input tokens, $0.70 per 1M output tokens | Same as serverless inference | | [gpt-oss-20b](https://platform.openai.com/docs/models/gpt-oss-20b) | $0.05 per 1M input tokens, $0.45 per 1M output tokens | Same as serverless inference | | [GPT-5.4](https://developers.openai.com/api/docs/models/gpt-5.4) | $2.50 per 1M input tokens, $15.00 per 1M output tokens, , Prompt caching:, $0.25 per 1M cache read input tokens, | Same as serverless inference | | [GPT-5.3-Codex](https://developers.openai.com/api/docs/models/gpt-5.3-codex) | $1.75 per 1M input tokens, $14.00 per 1M output tokens, , Prompt caching:, $0.175 per 1M cache read input tokens, | Not supported | | [GPT-5.2](https://platform.openai.com/docs/models/gpt-5.2) | $1.75 per 1M input tokens, $14.00 per 1M output tokens, , Prompt caching:, $0.175 per 1M cache read input tokens, | Same as serverless inference | | [GPT-5.2 pro](https://platform.openai.com/docs/models/gpt-5.2-pro) | $21.00 per 1M input tokens, $168.00 per 1M output tokens | Same as serverless inference | | [GPT-5.1-Codex-Max](https://platform.openai.com/docs/models/gpt-5.1-codex-max) | $1.25 per 1M input tokens, $10.00 per 1M output tokens, , Prompt caching:, $0.125 per 1M cache read input tokens, | Same as serverless inference | | [GPT-5](https://platform.openai.com/docs/models/gpt-5) | $1.25 per 1M input tokens, $10.00 per 1M output tokens, , Prompt caching:, $0.125 per 1M cache read input tokens, | Same as serverless inference | | [GPT-5 mini](https://platform.openai.com/docs/models/gpt-5-mini) | $0.25 per 1M input tokens, $2.00 per 1M output tokens, , Prompt caching:, $0.025 per 1M cache read input tokens, | Same as serverless inference | | [GPT-5 nano](https://platform.openai.com/docs/models/gpt-5-nano) | $0.05 per 1M input tokens, $0.40 per 1M output tokens, , Prompt caching:, $0.005 per 1M cache read input tokens, | Same as serverless inference | | [GPT-4.1](https://platform.openai.com/docs/models/gpt-4.1) | $2.00 per 1M input tokens, $8.00 per 1M output tokens, , Prompt caching:, $0.50 per 1M cache read input tokens, | Same as serverless inference | | [GPT-4o](https://platform.openai.com/docs/models/gpt-4o) | $2.50 per 1M input tokens, $10.00 per 1M output tokens, , Prompt caching:, $1.25 per 1M cache read input tokens, | Same as serverless inference | | [GPT-4o mini](https://platform.openai.com/docs/models/gpt-4o-mini) | $0.15 per 1M input tokens, $0.60 per 1M output tokens, , Prompt caching:, $0.075 per 1M cache read input tokens, | Same as serverless inference | | [o1](https://platform.openai.com/docs/models/o1) | $15.00 per 1M input tokens, $60.00 per 1M output tokens, , Prompt caching:, $7.50 per 1M cache read input tokens, | Same as serverless inference | | [o3](https://platform.openai.com/docs/models/o3) | $2.00 per 1M input tokens, $8.00 per 1M output tokens, , Prompt caching:, $0.50 per 1M cache read input tokens, | Same as serverless inference | | [o3-mini](https://platform.openai.com/docs/models/o3-mini) | $1.10 per 1M input tokens, $4.40 per 1M output tokens, , Prompt caching:, $0.55 per 1M cache read input tokens, | Same as serverless inference | | [GPT-image-1](https://platform.openai.com/docs/models/gpt-image-1) | $5.00 per 1M input tokens, $40.00 per 1M output tokens, , Prompt caching:, $1.25 per 1M cache read input tokens, | Not supported | ## DigitalOcean-Hosted Models | Provider | Model | Serverless Inference and ADK | Agent Usage | |---|---|---|---| | Alibaba | [Qwen3-32B](https://huggingface.co/Qwen/Qwen3-32B) | $0.25 per 1M input tokens , $0.55 per 1M output tokens | Not supported | | DeepSeek | [DeepSeek R1 Distill Llama 70B](https://huggingface.co/deepseek-ai/DeepSeek-R1-Distill-Llama-70B) | $0.99 per 1M input tokens, $0.99 per 1M output tokens | Same as serverless inference | | MiniMax | [M2.5](https://huggingface.co/MiniMaxAI/MiniMax-M2.5) (Public Preview) | $0.30 per 1M input tokens, $1.20 per 1M output tokens | Same as serverless inference | | Moonshot AI | [Kimi K2.5](https://www.kimi.com/ai-models/kimi-k2-5) | $0.50 per 1M input tokens, $2.70 per 1M output tokens | Same as serverless inference | | Meta | [Llama 3.3 Instruct-70B](https://huggingface.co/meta-llama/Llama-3.3-70B-Instruct) | $0.65 per 1M input tokens, $0.65 per 1M output tokens | Same as serverless inference | | Meta | [Llama 3.1 Instruct-8B](https://huggingface.co/meta-llama/Llama-3.1-8B-Instruct) | $0.198 per 1M input tokens, $0.198 per 1M output tokens | Same as serverless inference | | Mistral | [NeMo](https://mistral.ai/news/mistral-nemo/) | $0.30 per 1M input tokens, $0.30 per 1M output tokens | Same as serverless inference | | NVIDIA | [Nemotron-3-Super-120B](https://huggingface.co/nvidia/NVIDIA-Nemotron-3-Super-120B-A12B-NVFP4) (Public Preview) | $0.30 per 1M input tokens, $0.65 per 1M output tokens | Same as serverless inference | | Z.ai | [GLM 5](https://z.ai/blog/glm-5) | $1.00 per 1M input tokens, $3.20 per 1M output tokens | Same as serverless inference | ## Knowledge Bases Knowledge bases are billed for both indexing and storage: - **Indexing tokens**: We charge for tokens required to generate embeddings. Pricing is the same for manual and auto-indexing. Charges apply only when changes are detected (new, updated, or deleted files/URLs). If auto-indexing is paused or no changes are found, there are no charges. For example, a 10 MB dataset is about 3 million tokens, and a 1 GB dataset is about 250 million tokens. Actual costs depend on the embedding model: | Model | Price | |------------------------------|----------------------------| | `all-mini-lm-l6-v2` | $0.009 per 1M input tokens | | `multi-qa-mpnet-base-dot-v1` | $0.009 per 1M input tokens | | `gte-large-en-v1.5` | $0.09 per 1M input tokens | | `Qwen3 Embedding 0.6B` | $0.04 per 1,000,000 tokens | *One token is roughly four characters (approximately 75 words per 100 tokens). Non-Latin scripts, emojis, or binary data may increase token counts.* - **Storage**: Embeddings are stored in OpenSearch. See [OpenSearch pricing](https://docs.digitalocean.com/products/databases/opensearch/details/pricing/index.html.md). - **Chunking**: The chunking method you choose affects indexing cost because each algorithm embeds a different number of tokens. All indexing and re-indexing jobs are billed based on the total tokens embedded. - Section-based and fixed-length chunking are the most cost-efficient. They rely on simple splitting and do not perform semantic analysis, resulting in minimal and predictable token usage. - Semantic chunking is more expensive because it uses the embedding model twice, one to detect semantic boundaries and once to embed the final chunks. This typically results in 1.5 ot 3 times more indexing tokens, more total chunks, and a higher re-indexing cost when settings change. - Hierarchical chunking produces both parent and child embeddings, slightly increasing indexing cost. The main cost impact is during retrieval: agents receive both the child and its parent chunk, increasing the number of tokens sent to the model for each lookup. Any change to chunking settings requires re-indexing the affected data source, which always consumes additional tokens. Chunking does not incur a separate charge. Costs depend on the embedding token usage and OpenSearch storage, and vary by embedding model. For detailed behavior and tuning guidance, see [chunking reference page](https://docs.digitalocean.com/products/gradient-ai-platform/reference/chunking-strategies/index.html.md) and [chunking best practices](https://docs.digitalocean.com/products/gradient-ai-platform/concepts/chunking-strategies/index.html.md). ## Guardrails Charges apply for all tokens processed through guardrails: | Guardrail | Price | |---|---| | Content Moderation | $0.20 per 1,000,000 tokens | | Jailbreak Detection | $0.20 per 1,000,000 tokens | | Sensitive Data Detection | $0.34 per 1,000,000 tokens | Costs are per token. Creating, editing, or duplicating guardrails has no additional cost. ## Functions If you attach [DigitalOcean Functions](https://docs.digitalocean.com/products/functions/index.html.md) to your agent, you are billed at [functions pricing](https://docs.digitalocean.com/products/functions/details/pricing/index.html.md). ## Agent Evaluations Agent evaluations are charged by token usage at the same rates as [model usage](#foundation-model-usage). ## Log Stream Insights Log Stream Insights uses a third-party model to analyze agent trace data. You are charged per token: | Tokens | Price | |---|---| | Input | $1.10 per 1,000,000 tokens | | Output | $4.40 per 1,000,000 tokens | ## Agent Development Kit (public) You are not charged for using the Agent Development Kit during [public preview](https://docs.digitalocean.com/platform/product-lifecycle/index.html.md#public-preview). However, you are billed for other Gradient AI Platform features you use with your agent deployment: - If you are using a DigitalOcean hosted model through serverless inference in your agent deployment, you are [charged for those model keys](#foundation-model-usage). - For agent evaluations, token usage is charged to the agent model keys. For example, if your agent uses a serverless inference endpoint key, any token usage is charged to that key. If the agent uses a third-party model key, or a key to a model not hosted on DigitalOcean, you are charged by the hosting provider. - If you enable [Log Stream Insights](#log-stream-insights) for your agent deployment, you are charged for tokens when new insights are generated. **Note**: For General Availability, agent deployment hosting, measured in GiB-sec, will be charged. We will also be charging for judge input and output tokens, which are the tokens used for judging the agent inputs and outputs against the test case’s chosen metrics. These costs are waived during public preview. ## Dedicated Inference (public) Dedicated Inference is available in [public preview](https://docs.digitalocean.com/platform/product-lifecycle/index.html.md#public-preview) and enabled for all users. You can [contact support](https://cloudsupport.digitalocean.com) for questions or assistance. Dedicated Inference is [billed per GPU-hour of uptime](https://docs.digitalocean.com/products/droplets/details/pricing/index.html.md#gpu-droplet-pricing) for the GPU you run your model(s) on.