Claude 529 Overloaded Error: Causes and Safe Resolution

Key takeaways

HTTP 529 is a platform capacity signal, not a per-tenant rate limit.
Agents amplify overload because retry logic exists at every layer of the loop.
Exponential backoff with full jitter, retry budgets, and provider fallback shorten incidents instead of prolonging them.

If you have shipped anything on Anthropic's API, you have seen it:

HTTP 529
{ "type": "error", "error": { "type": "overloaded_error", "message": "Overloaded" } }

It looks like a rate limit, but it is not one. The Anthropic error reference is explicit: 529 means the API is experiencing more traffic than it can serve, independent of your account quota. Understanding that difference is the first step to a retry strategy that does not make things worse.

What 529 actually means

HTTP 529 is non-standard; Anthropic uses it to signal “the API is currently experiencing more traffic than it can serve, independent of your account's rate limits.” It is a system-wide capacity signal, not a per-tenant one. Your quota is fine. The pool is hot.

It is different from:

429 (rate_limit_error) you exceeded your own quota. Slow down.
500 / 503 something is broken on Anthropic's side. Retry.
529 (overloaded_error) capacity is shared and exhausted. Retry, but carefully.

Why it surfaces in agent loops

Agents amplify overload. A chat product makes one call per user message. An agent makes ten, fifty, sometimes hundreds, planning steps, tool-use loops, sub-agents. When the model provider is hot, every layer in an agent retries, and the retries collide. We covered the wire-level view in What actually happens on the wire.

The classic failure pattern: a 529 hits a sub-agent. The sub-agent retries three times. The parent agent sees the failure, retries the whole sub-task. Each retry spawns three more sub-agent attempts. You just turned one user request into 12 calls during the exact moment the provider asked you to back off. AWS has a classic write-up on jitter explaining why naive retries are the problem.

A retry policy that does not make it worse

Exponential backoff with full jitter. Not fixed delays, not capped backoff with no jitter. Start at about 1s, cap at about 30s, multiply by a random factor in [0.5, 1.5].
Retry budgets, not retry counts. Allow N total retries across the entire agent run, not N per call. Otherwise a deep tool-use tree multiplies your retry pressure by the depth of the tree.
Retry at the outermost layer only. Disable SDK auto-retry in nested components. Let one layer own the retry budget.
Honor retry-after. When Anthropic returns it, use it. Do not back off less than the header says.
Fail open on partial work. If the agent has already done useful work, return what you have rather than burning the run on a final retry storm.
Fallback to a smaller or different model. 529s are usually concentrated on one model. Falling back from Sonnet to Haiku, or to a different provider, often clears immediately.

Where guardrails fit

A network-layer proxy in front of the model API gives you a single place to enforce the retry budget, normalize retry-after handling, fall back across providers, and emit metrics. Without that, every agent service implements its own retry policy and they collide during incidents, exactly the wrong moment for divergence. (Same principle behind the MCP gateway pattern for tool calls.)

Treat 529 as a back-pressure signal, not a transient error. The provider is telling you the whole system is hot. Your job is to be the customer whose retry behavior makes the incident shorter, not longer.

Frequently asked questions

Is HTTP 529 the same as a rate limit?

No. HTTP 429 is a per-tenant rate limit (you exceeded your quota). HTTP 529 is a system-wide capacity signal from Anthropic that the shared pool is currently exhausted. Your quota is fine; the platform is hot.

How long should I wait before retrying a 529?

Honor the retry-after header when present. Otherwise use exponential backoff with full jitter starting at about 1 second, capped at 30 seconds, with a random multiplier in the range 0.5 to 1.5.

Why do my agents get 529 more often than my chat product?

Agents make many model calls per user request (planning steps, tool-use loops, sub-agents). Every layer that retries on its own compounds the load during exactly the moments Anthropic is asking you to back off.