Create a scheduled function to automatically reindex a knowledge base.
What retry or backoff behavior should I follow for 429 responses from serverless inference?
Validated on 19 May 2026 • Last edited on 12 Jun 2026
On a 429 response caused by DigitalOcean serverless inference quotas, the response body includes the error identifier too_many_requests. Do the following steps before you retry the request:
- Inspect the
x-ratelimit-reset-<limit-type>headers. Find the header with the soonest, non-zero reset timestamp, which identifies the binding limit and the tightest constraint. - Wait until
max(reset_timestamp - now, 1s)seconds before retrying, wherereset_timestampis the Unix time in seconds from thex-ratelimit-reset-<limit-type>header (when enough capacity is projected to be available again), andnowis the current Unix time in seconds. Take that difference or one second - whichever is larger. So your wait is at least one second, even if the header implies you could retry sooner. Also, add a small jitter of 250–500 ms before retrying. - Sometimes, the response also has a
Retry-After(seconds to wait) header. This header comes from upstream providers such as Anthropic or OpenAI that DigitalOcean forwards. If aRetry-Afterheader is present, use that value instead of the reset-based calculation.
If you see a reset:0 header (such as x-ratelimit-reset-requests: 0) after a rejection, it does not mean to retry the request immediately. Zero after a rejection is ambiguous or means the projection does not give you a safe wait time, so blind immediate retries without waiting can send more traffic to an already limited endpoint, increase failures, and make throttling worse.
When multiple buckets are contended concurrently or reset is 0, as a safety net, use an exponential backoff retry strategy where you wait longer after each failed attempt with jitter min(base × 2^attempt + jitter, 60s) with base = 500 ms, and cap retries at 5.