Give Feedback

DigitalOcean Gradient™ AI Platform Pricing

Validated on 9 Feb 2026 • Last edited on 6 Mar 2026

DigitalOcean Gradient™ AI Platform lets you build fully-managed AI agents with knowledge bases for retrieval-augmented generation, multi-agent routing, guardrails, and more, or use serverless inference to make direct requests to popular foundation models.

Gradient AI Platform has a usage-based pricing model, so costs scale with your actual usage.

We charge for model usage for serverless inference, Agent Development Kit (ADK)(in public preview), and agents created using the DigitalOcean Control Panel, CLI, or API, and for additional features like knowledge bases, guardrails, and log stream insights. We display prices per million tokens and bill per thousand tokens for accuracy.

Serverless inference is billed by DigitalOcean for both open-source and commercial models. Prices align with each provider’s published rates for transparency. If you are using a DigitalOcean hosted model through serverless inference in your agent deployment using the ADK, you are charged for those model keys.

Agent creation is free. Agent usage is billed by DigitalOcean for open-source models. You are charged for all input and output tokens processed by the agent. Token usage depends on factors such as input length, agent instructions, attached knowledge bases, and configuration settings. To optimize usage, test your agents and adjust their parameters.

Usage for commercial models in agents or evaluations with your own provider API keys (for example, OpenAI key or Anthropic key) is billed directly by the provider. DigitalOcean does not charge you for that model usage.

Foundation Model Usage

The following shows pricing for open-source and commercial models for serverless inference, ADK, and agent usage.

Alibaba

Model	Serverless Inference and ADK	Agent Usage
Qwen3-32B	$0.25 per 1M input tokens $0.55 per 1M output tokens	Not supported

Anthropic

Note

When using Anthropic commercial models with your own model API keys, billing is handled directly by Anthropic at the provider’s rates.

Claude Sonnet 4.6 (in Beta), Sonnet 4.5, and Sonnet 4 support an input context window of up to 1M tokens.

Model	Serverless Inference and ADK	Agent Usage
Claude Sonnet 4.6	For prompts less than or equal to 200K tokens: - $3.00 per 1M input tokens - $15.00 per 1M output tokens For prompts greater than 200K tokens: - $6.00 per 1M input tokens - $22.50 per 1M output tokens	Same as serverless inference.
Claude Sonnet 4.5	For prompts less than or equal to 200K tokens: - $3.00 per 1M input tokens - $15.00 per 1M output tokens For prompts greater than 200K tokens: - $6.00 per 1M input tokens - $22.50 per 1M output tokens	Same as serverless inference.
Claude Sonnet 4	For prompts less than or equal to 200K tokens: - $3.00 per 1M input tokens - $15.00 per 1M output tokens For prompts greater than 200K tokens: - $6.00 per 1M input tokens - $22.50 per 1M output tokens	Same as serverless inference.
Claude Haiku 4.5	$1.00 per 1M input tokens $5.00 per 1M output tokens	Same as serverless inference.
Claude Opus 4.6	For prompts less than or equal to 200K tokens: - $5.00 per 1M input tokens - $25.00 per 1M output tokens For prompts greater than 200K tokens: - $10.00 per 1M input tokens - $37.50 per 1M output tokens	Same as serverless inference.
Claude Opus 4.5	$5.00 per 1M input tokens $25.00 per 1M output tokens	Same as serverless inference.
Claude Opus 4.1	$15.00 per 1M input tokens $75.00 per 1M output tokens	Same as serverless inference.
Claude Opus 4	$15.00 per 1M input tokens $75.00 per 1M output tokens	Same as serverless inference.

DeepSeek

Model	Serverless Inference and ADK	Agent Usage
DeepSeek R1 Distill Llama 70B	$0.99 per 1M input tokens $0.99 per 1M output tokens	Same as serverless inference

fal

Model	Serverless Inference	Agent Usage
Fast SDXL	$0.0011 per compute second	Not supported
Flux Schnell	$0.0030 per megapixel	Not supported
Stable Audio 2.5 (Text-to-Audio)	$0.00058 per compute second	Not supported
Multilingual TTS v2	$0.10 per 1000 characters	Not supported

Model	Serverless Inference and ADK	Agent Usage
Llama 3.3 Instruct-70B	$0.65 per 1M input tokens $0.65 per 1M output tokens	Same as serverless inference
Llama 3.1 Instruct-8B	$0.198 per 1M input tokens $0.198 per 1M output tokens	Same as serverless inference

Model	Serverless Inference and ADK	Agent Usage
NeMo	$0.30 per 1M input tokens $0.30 per 1M output tokens	Same as serverless inference

Model	Serverless Inference and ADK	Agent Usage
gpt-oss-120b	$0.10 per 1M input tokens $0.70 per 1M output tokens	Same as serverless inference
gpt-oss-20b	$0.05 per 1M input tokens $0.45 per 1M output tokens	Same as serverless inference
GPT-5.4	$2.50 per 1M input tokens $15.00 per 1M output tokens	Same as serverless inference
GPT-5.3-Codex	$1.75 per 1M input tokens $14.00 per 1M output tokens	Same as serverless inference
GPT-5.2	$1.75 per 1M input tokens $14.00 per 1M output tokens	Same as serverless inference
GPT-5.2 pro	$21.00 per 1M input tokens $168.00 per 1M output tokens	Same as serverless inference
GPT-5.1-Codex-Max	$1.25 per 1M input tokens $10.00 per 1M output tokens	Same as serverless inference
GPT-5	$1.25 per 1M input tokens $10.00 per 1M output tokens	Same as serverless inference
GPT-5 mini	$0.25 per 1M input tokens $2.00 per 1M output tokens	Same as serverless inference
GPT-5 nano	$0.05 per 1M input tokens $0.40 per 1M output tokens	Same as serverless inference
GPT-4.1	$2.00 per 1M input tokens $8.00 per 1M output tokens	Same as serverless inference
GPT-4o	$2.50 per 1M input tokens $10.00 per 1M output tokens	Same as serverless inference
GPT-4o mini	$0.15 per 1M input tokens $0.60 per 1M output tokens	Same as serverless inference
o1	$15.00 per 1M input tokens $60.00 per 1M output tokens	Same as serverless inference
o3	$2.00 per 1M input tokens $8.00 per 1M output tokens	Same as serverless inference
o3-mini	$1.10 per 1M input tokens $4.40 per 1M output tokens	Same as serverless inference
GPT-image-1	$5.00 per 1M input tokens $40.00 per 1M output tokens	Not supported.

Knowledge Bases

Knowledge bases are billed for both indexing and storage:

Indexing tokens: We charge for tokens required to generate embeddings. Pricing is the same for manual and auto-indexing. Charges apply only when changes are detected (new, updated, or deleted files/URLs). If auto-indexing is paused or no changes are found, there are no charges.

For example, a 10 MB dataset is about 3 million tokens, and a 1 GB dataset is about 250 million tokens.

Actual costs depend on the embedding model:

Model	Price
`all-mini-lm-l6-v2`	$0.009 per 1M input tokens
`multi-qa-mpnet-base-dot-v1`	$0.009 per 1M input tokens
`gte-large-en-v1.5`	$0.09 per 1M input tokens

One token is roughly four characters (approximately 75 words per 100 tokens). Non-Latin scripts, emojis, or binary data may increase token counts.

Storage: Embeddings are stored in OpenSearch. See OpenSearch pricing.
Chunking: The chunking method you choose affects indexing cost because each algorithm embeds a different number of tokens. All indexing and re-indexing jobs are billed based on the total tokens embedded.
- Section-based and fixed-length chunking are the most cost-efficient. They rely on simple splitting and do not perform semantic analysis, resulting in minimal and predictable token usage.
- Semantic chunking is more expensive because it uses the embedding model twice, one to detect semantic boundaries and once to embed the final chunks. This typically results in 1.5 ot 3 times more indexing tokens, more total chunks, and a higher re-indexing cost when settings change.
- Hierarchical chunking produces both parent and child embeddings, slightly increasing indexing cost. The main cost impact is during retrieval: agents receive both the child and its parent chunk, increasing the number of tokens sent to the model for each lookup. Any change to chunking settings requires re-indexing the affected data source, which always consumes additional tokens. Chunking does not incur a separate charge. Costs depend on the embedding token usage and OpenSearch storage, and vary by embedding model. For detailed behavior and tuning guidance, see chunking reference page and chunking best practices.

Guardrails

Charges apply for all tokens processed through guardrails:

Guardrail	Price
Content Moderation	$0.20 per 1,000,000 tokens
Jailbreak Detection	$0.20 per 1,000,000 tokens
Sensitive Data Detection	$0.34 per 1,000,000 tokens

Costs are per token. Creating, editing, or duplicating guardrails has no additional cost.

Functions

If you attach DigitalOcean Functions to your agent, you are billed at functions pricing.

Agent Evaluations

Agent evaluations are charged by token usage at the same rates as model usage.

Log Stream Insights

Log Stream Insights uses a third-party model to analyze agent trace data. You are charged per token:

Tokens	Price
Input	$1.10 per 1,000,000 tokens
Output	$4.40 per 1,000,000 tokens

Agent Development Kit public

You are not charged for using the Agent Development Kit during public preview. However, you are billed for other Gradient AI Platform features you use with your agent deployment:

If you are using a DigitalOcean hosted model through serverless inference in your agent deployment, you are charged for those model keys.
For agent evaluations, token usage is charged to the agent model keys. For example, if your agent uses a serverless inference endpoint key, any token usage is charged to that key. If the agent uses a third-party model key, or a key to a model not hosted on DigitalOcean, you are charged by the hosting provider.
If you enable Log Stream insights for your agent deployment, you are charged for Log Stream insights tokens when new insights are generated.