DigitalOcean Gradient™ AI Inference Hub Limits

Validated on 26 Jun 2018 • Last edited on 16 Mar 2026

DigitalOcean Gradient™ AI Inference Hub provides a single control plane for managing inference workflows. It includes a Model Catalog where you can view available foundation models, including both DigitalOcean-hosted and third-party commercial models, compare capabilities and pricing, and run inference using serverless or dedicated deployments. DigitalOcean Gradient AI Inference Hub is in public preview and enabled for all users. You can contact support for questions or assistance.

Model Catalog Limits

  • The Model Catalog API allows up to 5,000 requests per hour from a single client IP address. Short bursts of traffic are permitted, with up to 250 additional requests allowed within a one-minute period.

Model Playground Limits

  • Teams have limited number of tokens available for each model tested in the Model Playground.

    Tokens for the Model Playground both replenish every 24 hours. For example, tokens used at 9:05 on Wednesday replenish at 9:05 on Thursday.

Serverless Inference Limits

  • Serverless inference supports the two to three most recent stable versions of each model to ensure consistent performance and reliable maintenance. For the list of supported models and versions, see the available model offerings.

  • Serverless inference model endpoints support OpenAI-compatible request formats but may not be compatible with all OpenAI tools and plugins.

  • Serverless inference provides access to commercial models, but not all model-specific features are supported. For example, features like Anthropic’s extended thinking are not available.

  • OpenAI models accessed through serverless inference not support zero data retention. If your use case requires strict data privacy or compliance, consider using a different model or contact support for guidance.

Dedicated Inference Limits

Dedicated Inference is available in public preview and enabled for all users. You can contact support for questions or assistance.

  • The number of endpoints you can create when using dedicated inference depends on the limits set for your account. We use dynamic resource limits to protect our platform against bad actors. To request a limit increase, contact support. If you are a team owner or resource modifier, you can check your resource limits and request an increase on the Resource Limits page in the DigitalOcean Control Panel.

We can't find any results for your search.

Try using different keywords or simplifying your search terms.