Can't Sign In?

What retry or backoff behavior should I follow for 429 responses from serverless inference?

Last verified 13 Jul 2026

Copy page as Markdown View page as Markdown

On a 429 response caused by DigitalOcean serverless inference quotas, the response body includes the error identifier too_many_requests. Do the following steps before you retry the request:

Inspect the x-ratelimit-reset-<limit-type> headers. Find the header with the soonest, non-zero reset timestamp, which identifies the binding limit and the tightest constraint.
Wait until max(reset_timestamp - now, 1s) seconds before retrying, where reset_timestamp is the Unix time in seconds from the x-ratelimit-reset-<limit-type> header (when enough capacity is projected to be available again), and now is the current Unix time in seconds. Take that difference or one second - whichever is larger. So your wait is at least one second, even if the header implies you could retry sooner. Also, add a small jitter of 250–500 ms before retrying.
Sometimes, the response also has a Retry-After (seconds to wait) header. This header comes from upstream providers such as Anthropic or OpenAI that DigitalOcean forwards. If a Retry-After header is present, use that value instead of the reset-based calculation.

If you see a reset:0 header (such as x-ratelimit-reset-requests: 0) after a rejection, it does not mean to retry the request immediately. Zero after a rejection is ambiguous or means the projection does not give you a safe wait time, so blind immediate retries without waiting can send more traffic to an already limited endpoint, increase failures, and make throttling worse.

When multiple buckets are contended concurrently or reset is 0, as a safety net, use an exponential backoff retry strategy where you wait longer after each failed attempt with jitter min(base × 2^attempt + jitter, 60s) with base = 500 ms, and cap retries at 5.

How do I schedule automatic reindexing for my knowledge bases?

Create a scheduled function to automatically reindex a knowledge base.

What retry or backoff behavior should I follow for 429 responses from serverless inference?

Related Topics

We can't find any results for your search.