RESOURCE_EXHAUSTED
GCP RESOURCE_EXHAUSTED means a quota, rate limit, or finite backend capacity was exhausted for the request.
Last reviewed: April 18, 2026|Source-backed guidance under our editorial policy
Start Here
Use the closest compare guide, playbook, or adjacent error page to narrow the decision faster before you start changing production systems.
This page is part of the Error Reference library. Learn more about the project or report a correction.
What Does Resource Exhausted Mean?
Some concrete budget is gone: a request quota, burst window, per-principal allowance, or finite backend capacity bucket. Recovery is about identifying the exhausted dimension, calming demand, and restoring headroom without turning retries into a second outage.
Common Causes
- -Per-project, per-user, or per-method quota limits were exceeded.
- -Burst traffic combined with aggressive retries triggered short-window throttling.
- -Workload demand exceeded available backend capacity or configured throughput.
- -Multiple workers or tenants are contending for the same constrained quota bucket.
- -One noisy tenant or batch job is draining a shared quota project and starving unrelated traffic.
How to Fix Resource Exhausted
- 1Inspect error details and quota metrics to identify the exact exhausted dimension and scope.
- 2Apply exponential backoff with full jitter and strict retry budgets for idempotent calls.
- 3Reduce burst concurrency using queueing, token buckets, or workload smoothing.
- 4Request quota increase or redistribute traffic across projects, tenants, or regions only after proving which bucket is actually empty.
Step-by-Step Diagnosis for Resource Exhausted
- 1Capture request rate, concurrency, and retry amplification during the failure interval.
- 2Map the failure to a concrete quota metric, method, and principal/project scope.
- 3Verify whether errors are sustained hard-quota exhaustion or transient throttling windows.
- 4Identify whether the exhaustion is quota-governed, backend-capacity-governed, or retry-self-amplified.
- 5Replay traffic with controlled ramps and corrected retry policy to validate recovery behavior.
Seen in Production
- -A retry storm after a dependency wobble exhausts per-minute API quota, so the second wave of traffic gets RESOURCE_EXHAUSTED even after the original problem stabilizes.
- -One shared quota project backs multiple workloads, and a single tenant spike consumes the headroom that the rest of the fleet needs.
- -Autoscaled workers all wake at once, flood a short-window rate bucket, and get throttled before useful work can spread out.
- -A backend capacity pool is saturated, so callers see RESOURCE_EXHAUSTED even though the documented quota numbers still look healthy at first glance.
Quota Dimension and Scope Analysis
- -Identify exact quota key and scope from diagnostics (example: per-minute method quota exhausted for one service account).
- -Confirm billing or quota project mapping in user-credential flows (example: requests charge an unintended quota project and exhaust its limits).
Retry Budget and Traffic Shaping
- -Inspect retry fan-out to prevent self-amplifying load (example: three nested retries across client, worker, and gateway layers).
- -Introduce smoothing controls for burst producers (example: token-bucket limiter at ingress keeps request rate below short-window thresholds).
Decision Shortcut: Hard Quota vs Short-Window Burst
- -If the error clears quickly when retries calm down, suspect short-window throttling or self-amplified bursts before filing quota tickets.
- -If one project, principal, or region fails while others stay healthy, isolate the exact quota bucket before scaling the whole system.
- -If capacity saturation appears before explicit quota metrics flatten, inspect backend worker pools and dependency bottlenecks alongside quota dashboards.
Wrong Fix to Avoid
- -Do not raise retry counts when the retries themselves are draining the exhausted budget.
- -Do not file a quota-increase request before proving whether the problem is really a shared-capacity or retry-amplification issue.
- -Do not treat all RESOURCE_EXHAUSTED events as identical; per-user, per-project, and backend-capacity exhaustion recover differently.
Implementation Examples
{
"requestId": "req_3b11aa",
"status": "RESOURCE_EXHAUSTED",
"quotaMetric": "compute.googleapis.com/read_requests",
"quotaLimit": "ReadRequestsPerMinutePerProject",
"project": "prod-app",
"retryAttempt": 2
}{
"retryPolicy": {
"maxAttempts": 4,
"initialBackoff": "1s",
"maxBackoff": "16s",
"backoffMultiplier": 2,
"retryableStatusCodes": ["RESOURCE_EXHAUSTED"]
}
}gcloud logging read 'severity>=WARNING AND textPayload:RESOURCE_EXHAUSTED' --limit=20
gcloud services quotas list --service=compute.googleapis.com --consumer=projects/PROJECT_IDIncident Timeline
16:02 UTC
Demand begins to outrun one concrete quota or capacity bucket
Signal: Traffic, retries, or tenant skew starts burning through a short-window rate bucket or finite backend capacity pool.
Why it matters: The first useful question is which bucket is empty, not whether we should just retry more.
16:05 UTC
Requests begin failing with RESOURCE_EXHAUSTED
Signal: Quota metrics flatten, backend capacity saturates, or one principal/project starts getting rejected while peers still succeed.
Why it matters: This is where scope matters: one exhausted bucket can look global if callers all share the same path.
16:06 UTC
Retries and concurrent producers amplify the failure
Signal: Extra attempts pile onto the same depleted bucket, keeping the service pinned at the limit.
Why it matters: Without retry budgets and smoothing, the response to the incident becomes part of the incident.
16:14 UTC
Traffic is smoothed and headroom returns
Signal: Burst producers are throttled, retries back off, or quota/capacity is redistributed so the same workload starts succeeding again.
Why it matters: That confirms the fix lived in headroom recovery and demand shaping, not blind retry escalation.
Seen in Production
Retry storm after partial outage exhausts per-minute API quota
Frequency: common
Example: Workers retry simultaneously and push request rate beyond project quota ceiling.
Fix: Introduce centralized jittered retry budget and stagger worker recovery.
Shared quota project drained by unexpected tenant spike
Frequency: common
Example: One high-volume tenant consumes quota headroom and starves other workloads using the same quota project.
Fix: Partition quota domains or isolate noisy tenant traffic into dedicated project capacity.
Capacity pool saturates before quota dashboards look obviously full
Frequency: medium
Example: A backend worker pool or downstream dependency caps out, so clients see RESOURCE_EXHAUSTED even though top-level quota graphs lag behind.
Fix: Correlate backend saturation metrics with quota data and scale or rebalance the actual limiting pool.
Wrong Fix vs Better Fix
Retry harder vs calm demand first
Wrong fix: Increase retry count because the service looks temporarily overwhelmed.
Better fix: Add jittered backoff, retry budgets, and producer smoothing before increasing any attempt volume.
Why this is better: RESOURCE_EXHAUSTED often gets worse when the recovery path consumes the same limited bucket.
Request more quota blindly vs identify the exhausted dimension
Wrong fix: Immediately ask for more quota or more nodes without proving which budget is actually exhausted.
Better fix: Map the failure to one metric, principal, project, region, or capacity pool first, then scale the right boundary.
Why this is better: You can waste time scaling the wrong layer if the actual limiter is elsewhere.
Treat tenants equally vs isolate the noisy bucket
Wrong fix: Let all producers continue at the same rate during exhaustion.
Better fix: Isolate noisy tenants, stagger recovery, or partition quota domains to protect critical traffic.
Why this is better: Shared buckets fail fastest when one noisy cohort can starve the rest of the system.
Debugging Tools
- -Cloud Monitoring quota metrics and dashboards
- -Cloud Audit Logs for throttled method patterns
- -Retry telemetry and distributed trace timelines
- -Service Usage quota and limit inspection
- -Per-tenant or per-principal quota attribution logs
How to Verify the Fix
- -Run controlled load tests and confirm RESOURCE_EXHAUSTED remains below error-budget thresholds.
- -Verify retry paths follow jittered exponential policy and do not create secondary bursts.
- -Monitor quota headroom, shared-bucket utilization, and sustained throughput after production rollout.
- -Confirm one noisy producer can no longer starve unrelated traffic on the same quota or capacity path.
How to Prevent Recurrence
- -Define service-level retry standards with centralized backoff and retry-budget enforcement.
- -Set proactive alerts on quota-utilization slopes, not only hard-limit breaches.
- -Use adaptive throttling and queue backpressure to flatten producer bursts automatically.
- -Partition quota projects or tenant domains where one workload can otherwise drain shared headroom.
Pro Tip
- -reserve operational quota headroom for incident traffic by capping non-critical batch jobs during peak windows.
Decision Support
Compare Guide
AWS ThrottlingException vs GCP RESOURCE_EXHAUSTED
Compare AWS ThrottlingException and GCP RESOURCE_EXHAUSTED to separate rate limiting from quota/resource exhaustion and choose the remediation path.
Playbook
Rate Limit Recovery Playbook (429 / ThrottlingException / RESOURCE_EXHAUSTED)
Use this playbook to separate transient throttling from hard quota exhaustion and apply retry, traffic-shaping, and quota-capacity fixes safely.
Official References
Provider Context
This guidance is specific to GCP services. Always validate implementation details against official provider documentation before deploying to production.