TooManyRequestsException - Too Many Requests
AWS TooManyRequestsException is a Lambda-specific throttling error indicating that the function has reached its concurrency limit at either the function or account level. The request is rejected immediately without execution to protect the regional infrastructure.
Last reviewed: March 15, 2026|Editorial standard: source-backed technical guidance
What Does Too Many Requests Mean?
This exception is Lambda's "Air Traffic Control" signal. It triggers when the number of concurrent executions exceeds the allowed quota (default 1,000 per region). Unlike a 504 Timeout, the code never runs, saving you from compute costs but failing the request. It can be caused by account-wide exhaustion (unreserved pool) or reaching a specific function's Reserved Concurrency cap.
Common Causes
- -Account-Level Exhaustion: A single "runaway" Lambda function consumes all 1,000 regional slots, throttling every other function in your account.
- -Reserved Concurrency Ceiling: The function has a hard cap (e.g., 50) that is too low for sudden traffic bursts, even if the account has free capacity.
- -Burst Limit Violations: Scaling up too fast (e.g., from 0 to 3,000 in seconds) exceeds the per-minute burst increase limit of the region.
- -Upstream Over-Fanout: SQS or Kinesis triggers spawning Lambda instances faster than the function is allowed to scale.
How to Fix Too Many Requests
- 1Increase Regional Quota: Use AWS Service Quotas to request a limit increase if your baseline concurrency is consistently above 80% of the limit.
- 2Adjust Reserved Concurrency: Ensure critical functions have enough reserved capacity to handle spikes without being throttled by "noisy neighbors."
- 3Implement Jittered Backoff: Ensure calling services (API Gateway, SDKs) don't retry instantly, which creates a "retry storm" that sustains the throttle.
- 4Limit SQS Batch Size: Reduce the concurrency of the Event Source Mapping to smooth out the processing rate.
Step-by-Step Diagnosis for Too Many Requests
- 1Check CloudWatch Metrics: Look for
ThrottlesandConcurrentExecutionsat both the function and account level. - 2Verify "UnreservedConcurrentExecutions": If this is near zero, your account is exhausted; find the "noisy neighbor" function.
- 3Identify Burst Throttling: If
ConcurrentExecutionsis below the limit butThrottlesis high, you are scaling faster than the regional burst limit allows. - 4Check Usage Plans: Ensure API Gateway isn't the source of the 429 error before it even reaches Lambda.
Concurrency Type Comparison
- -Reserved Concurrency: Guarantees capacity but caps the function. Use to protect critical paths.
- -Provisioned Concurrency: Pre-warms instances. Use to eliminate cold starts AND throttling during predictable spikes.
- -Unreserved Pool: The shared regional bucket. Risky for production-critical functions due to noisy neighbor effects.
Burst vs. Account Limits
- -Burst Limit: Limits how fast you can scale (varies by region, e.g., 500-3000).
- -Account Limit: Limits how much you can scale in total (default 1000).
Implementation Examples
// Lambda throttles are transient; retrying with jitter is key
const config = {
maxAttempts: 5,
retryStrategy: new ConfiguredRetryStrategy(5, (attempt) => {
return Math.min(100 * Math.pow(2, attempt) + Math.random() * 50, 5000);
})
};import boto3
client = boto3.client('lambda')
# Check if we have reserved concurrency set
response = client.get_function_concurrency(FunctionName='my-function')
print(f"Reserved Concurrency: {response.get('ReservedConcurrentExecutions')}")How to Verify the Fix
- -Monitor the
Throttlesmetric in CloudWatch; it should return to a steady zero. - -Verify that the
ConcurrentExecutionspeaks remain safely below your new Service Quota limits. - -Test with an upstream load generator to ensure exponential backoff logic is effectively smoothing the traffic.
How to Prevent Recurrence
- -Set CloudWatch Alarms: Trigger alerts when concurrency hits 70% of the regional or reserved limit.
- -Use Provisioned Concurrency: For marketing campaigns or scheduled events where traffic spikes are known in advance.
- -Leaky Bucket Pattern: Use API Gateway Throttling/Usage Plans to reject excess traffic at the edge before it hits Lambda.
- -Pro-tip: Never set Reserved Concurrency to 0 unless you want to "kill-switch" a function during an incident.
Decision Support
Compare Guide
429 Too Many Requests vs 503 Service Unavailable
Use 429 for caller-specific throttling and 503 for service-wide outages, so retry behavior, escalation paths, and incident ownership stay correct.
Compare Guide
AWS ThrottlingException vs GCP RESOURCE_EXHAUSTED
Compare AWS ThrottlingException and GCP RESOURCE_EXHAUSTED to separate rate limiting from quota/resource exhaustion and choose the remediation path.
Playbook
Rate Limit Recovery Playbook (429 / ThrottlingException / RESOURCE_EXHAUSTED)
Use this playbook to separate transient throttling from hard quota exhaustion and apply retry, traffic-shaping, and quota-capacity fixes safely.
Official References
Provider Context
This guidance is specific to AWS services. Always validate implementation details against official provider documentation before deploying to production.