Give Feedback

Inference Limits

Validated on 28 Apr 2026 • Last edited on 8 May 2026

Inference provides a single control plane for managing inference workflows. It includes a Model Catalog where you can view available foundation models, including both DigitalOcean-hosted and third-party commercial models, compare model capabilities and pricing, use routing to match inference requests to the best-fit model, and run inference using serverless or dedicated deployments.

Copy page as Markdown View page as Markdown

Platform Limits

To manage compute resources and ensure fair resource distribution, DigitalOcean sets dynamic limits on resource creation and model usage. To request a limit increase, contact support. If you are a team owner or resource modifier, you can check your resource limits and request an increase on the Resource Limits page in the DigitalOcean Control Panel.

Model Catalog Limits

Model Catalog data currently cannot be retrieved through the DigitalOcean API.
The MCP server endpoint uses the standard API rate limits. If you need higher limits for production workloads, contact support.

Bring Your Own Models (BYOM) Limits

Each team can import up to 10, 50, or 100 BYOM models, depending on your resource tier.
BYOM imports support only Safetensors files and the following accompanying file types: .json, .yaml, .yml, .jinja, .model, .txt, .png, .jpg, .jpeg, .md, LICENSE, NOTICE, .gitattributes, and .gitignore. Only dedicated inference-compatible architectures are supported, including Qwen2ForCausalLM and Qwen3ForCausalLM.
You can only import BYOM models with approved open-source licenses, such as Apache License 2.0, Boost Software License, BSD licenses, PSFL, MIT License, Universal Permissive License 1.0, W3C Software License, WTFPL, and Zlib License.
You cannot import the same Hugging Face URL model more than once.
Imported BYOM model names can only include letters, numbers, slashes (/), and hyphens (-).
Imported models can be up to 200 GB. This limit may change over time.
You can import only one model at a time. To import multiple models, repeat the import process for each model.
You currently cannot upload model files directly from your local computer. Import models through Hugging Face or Spaces.
You cannot change a model’s name or region after import.
You can only filter BYOM models using the Control Panel.
BYOM models cannot be tested in Model Playground since Model Playground only supports models available through Serverless Inference.
BYOM isn’t supported in Gradient Python SDK, Python OpenAI, the DigitalOcean API, or CLI. To use BYOM, import models using the Control Panel.
You cannot use imported models for agents.
Imported BYOM model weights are stored in a service-managed DigitalOcean Spaces location and incur storage charges based on GB per month. This storage isn’t directly accessible. For pricing details, see Spaces pricing.
BYOM currently only supports deployment through dedicated inference.

Foundation Model Limits

You cannot bring your own models to use for agents when using the DigitalOcean Control Panel, doctl command-line interface, or DigitalOcean API. You can see the models we offer on our model overview page.
- For agents built and deployed using the Agent Development Kit (ADK), you can use any model key, even if the model isn’t hosted on DigitalOcean or the model key is not provided by DigitalOcean.
Alibaba Qwen3-32B model is only available for ADK.
Multimodal models for image and audio generation provided by fal, GPT Image 1, GPT Image 1.5, and GPT Image 2 are only available for serverless inference.

Model Playground Limits

Only images are supported for file uploads.

Serverless Inference Limits

Serverless inference supports the two to three most recent stable versions of each model to ensure consistent performance and reliable maintenance. For the list of supported models and versions, see the available model offerings.
Serverless inference model endpoints support OpenAI-compatible request formats but may not be compatible with all OpenAI tools and plugins.
Serverless inference provides access to commercial models, but not all model-specific features are supported. For example, features like Anthropic’s extended thinking are not available.

Dedicated Inference Limits

The number of endpoints you can create when using dedicated inference depends on the limits set for your account. We use dynamic resource limits to protect our platform against bad actors. To request a limit increase, contact support. If you are a team owner or resource modifier, you can check your resource limits and request an increase on the Resource Limits page in the DigitalOcean Control Panel.
Re-ranking, embedding, and audio/TTS models are not currently supported for deployment on a dedicated inference endpoint

Batch Inference Limits

Open-source and DigitalOcean-hosted models are not supported for batch inference.
Only text prompts for OpenAI and Anthropic commercial models are supported. Multimodal requests and image generation batch jobs are not supported.
Each batch job uses a single model. Multi-model batch jobs are not supported.

Batch inference uses separate rate limits from serverless inference or dedicated inference:

Limit	Default
Enqueue token limit	10 billion tokens per model per account
Requests per file	50,000
Maximum file size	200 MB
Completion window	24 hours
Concurrent batch jobs	No hard limit (token-based quota applies)

To request a limit increase, contact support. If you are a team owner or resource modifier, you can check your resource limits and request an increase on the Resource Limits page in the DigitalOcean Control Panel.

A running batch job does not consume your real-time API quota or degrade latency for your production applications.

Batch traffic is isolated from real-time traffic. Batch jobs run at lower scheduling priority and share off-peak GPU capacity. Batch scheduling does not degrade real-time inference p99 latency by more than 5%.
Additional limits apply based on your security tier.

Model Evaluations Limits

Model evaluation datasets have the following limits:
- Each dataset must have less than 1000 rows
- Each dataset must be less than 1GB in size regardless of the customer tier
Lower tier customers do not have access to commercial models from Anthropic or OpenAI for evaluation or judging.

Inference Router Limits public

Inference Router is available in public preview and enabled for all users. You can contact support for questions or assistance.

You can select up to three models in the model pool.
Models supported for Dedicated Inference are displayed only when the selection policy is optimizing latency or ranking manually.
Routers support 1,000 requests per minute.

Agent Limits

Teams have a daily limit on the number of agents they can create.
Teams have limited number of tokens available for agents to use. We allocate a predetermined amount of tokens for each model your team uses, and each agent on your team draws tokens from that model’s amount of tokens.
You cannot view the sources used to generate a response in the chatbot interface.
You cannot access agent tracing data through the API. Tracing is only available through the Agent Playground and an agent’s Observability page in the control panel.
You cannot define custom fields or metadata for agent tracing. Traces only include information such as inputs, outputs, token usage, processing time, and resource access.
Agent tracing does not display routing data for other agents accessed during a request.
When using insights, disabling this feature only stops the collection of new insights and does not delete existing ones.
For agents built and deployed using Agent Development Kit (ADK):
- Agent Playground and guardrails offered by DigitalOcean are not available.
- Knowledge bases can be attached, detached, or managed only using the knowledge base endpoints.
- Agent deployments cannot be moved from one workspace to another. You must redeploy the agent to a new workspace with the environment defined.
- Automatic rollback to a previous release is not available. You can redeploy the agent code for that release to your environment.
- You can only destroy an agent deployment using the DigitalOcean Control Panel.

DigitalOcean Knowledge Base Limits

The MCP server endpoint uses the standard API rate limits. If you need higher limits for production workloads, contact support.
Reranking is not supported through the knowledge bases MCP endpoint.
You cannot edit attributes of a knowledge base using the DigitalOcean API. Instead, edit your knowledge base using the DigitalOcean Control Panel.
You cannot change embeddings models after creating a knowledge base.
You cannot change the OpenSearch database for an existing knowledge base. To use a different database, create a new knowledge base with the same data sources, and then choose the new OpenSearch database you want to use.
Each team can create up to 120 knowledge bases. To increase this limit, contact support.

Data Source Limits

Estimates are available only for locally uploaded files and Spaces buckets. External sources cannot be estimated.
Estimates are approximate and may differ from the final extractable text due to file structure, parsing behavior, or non-text (binary) content.
For web crawling data sources, the crawler indexes up to 5,500 pages and skips inaccessible or disallowed links to prevent excessively large indexing jobs.
The size of S3 buckets is unavailable in the Control Panel. You can view the size of S3 buckets on Amazon.
You cannot currently reindex a previously crawled seed URL. To reindex the content, delete the seed URL, and then add it again to start a new crawl.
Knowledge bases partially support indexing PowerPoint files (.ppt, .pptx). Text is extracted, but images and other visual content are not processed.
Indexing image files (such as .png, .jpeg, .tiff, and .bmp) are not currently supported.

Indexing Limits

Indexable size cannot be predicted for web crawls, GitHub repositories, external URLs, APIs, or any source that cannot be inspected beforehand.
You cannot re-index specific data sources within a knowledge base. To re-index any changed data sources, you need to reindex all the data sources.
Auto-indexing your data sources currently runs only once per day, up to seven days a week.

Chunking Limits

Chunk sizes (max_chunk_size, parent_chunk_size, child_chunk_size) must remain within the token limits of the selected embeddings model.
All chunking strategies enforce a minimum chunk size of approximately 100 tokens.
Chunking settings apply per data source, not globally.
Changing the chunking strategy after a data source is created is not supported. To change strategies, you must remove the data source, and then re-add it with the preferred strategy.

Activity Limits

Only the 15 most recent activities are listed in a knowledge base’s Activity tab. If you want to keep a copy of the past indexing job logs, download the CSV file after running your indexing job.
You cannot access your knowledge base’s activity logs through the DigitalOcean API. Activity logs are only available in the Control Panel.
A knowledge base’s activity logs currently track only indexing jobs.

Guardrails Limits

You cannot customize detection rules for guardrails.

Functions Limits

We only support web functions for function routing from agents.
If you have a public agent that calls a private function, anyone with the function’s URL can call the private function. We recommend setting your function to Secure Web Function to enable authentication.

Agent Evaluations Limits

Each test case dataset can contain up to 500 prompts. If the dataset includes more than 500 prompts, only the first 500 are used in the evaluation.
Evaluation runs have a 10,000 token cap across all prompts and responses combined.
You cannot average scores across multiple metrics, so each test case must have one star metric to determine overall performance.
You cannot configure thresholds for non-star metrics.
You cannot adjust the number of judges used in your agent evaluations.
You cannot delete test cases. Instead, you can archive the test cases to hide them from your list of test cases. You can also delete your workspace, which deletes all associated test cases and runs permanently.
We do not support synthetic dataset generation. You must upload your own dataset.
You cannot edit datasets through Agent Platform. Instead, edit the dataset and then reupload it.