If you've shipped anything on Anthropic's API, you've seen it:
HTTP 529
{ "type": "error", "error": { "type": "overloaded_error", "message": "Overloaded" } }It looks like a rate limit, but it isn't one. Understanding the difference is the first step to a retry strategy that doesn't make things worse.
What 529 actually means
HTTP 529 is non-standard; Anthropic uses it to signal "the API is currently experiencing more traffic than it can serve, independent of your account's rate limits." It's a system-wide capacity signal, not a per-tenant one. Your quota is fine. The pool is hot.
It is different from:
- 429 (rate_limit_error) you exceeded your own quota. Slow down.
- 500 / 503 something is broken on Anthropic's side. Retry.
- 529 (overloaded_error) capacity is shared and currently exhausted. Retry, but carefully.
Why it surfaces in agent loops
Agents amplify overload. A chat product makes one call per user message. An agent makes ten, fifty, sometimes hundreds tool-use loops, planning steps, sub-agents. When the model provider is hot, every layer in an agent retries, and the retries collide.
The classic failure pattern: a 529 hits a sub-agent. The sub-agent retries three times. The parent agent sees the failure, retries the whole sub-task. Each retry spawns three more sub-agent attempts. You just turned one user request into 12 calls during the exact moment the provider asked you to back off.
A retry policy that doesn't make it worse
- Exponential backoff with full jitter. Not fixed delays, not capped backoff with no jitter. The thundering-herd problem is real; the jitter is what fixes it. Start at ~1s, cap at ~30s, multiply by a random factor in [0.5, 1.5].
- Retry budgets, not retry counts. Allow N total retries across the entire agent run, not N per call. Otherwise a deep tool-use tree multiplies your retry pressure by the depth of the tree.
- Retry at the outermost layer only. Disable SDK auto-retry in nested components. Let one layer own the retry budget.
- Honor
retry-after. When Anthropic returns it, use it. Don't back off less than the header says. - Fail open on partial work. If the agent has already done useful work, return what you have rather than burning the run on a final retry storm.
- Fallback to a smaller / different model. 529s are usually concentrated on one model. Falling back from Sonnet to Haiku or to a different provider often clears immediately.
Where guardrails fit
A network-layer proxy in front of the model API gives you a single place to enforce the retry budget, normalize retry-after handling, fall back across providers, and emit metrics. Without that, every agent service implements its own retry policy and they collide during incidents exactly the wrong moment for divergence.
Treat 529 as a back-pressure signal, not a transient error. The provider is telling you the whole system is hot. Your job is to be the customer whose retry behavior makes the incident shorter, not longer.