ThrottlingException
AWS ThrottlingException means the service rejected request rate or concurrency beyond allowed limits. Depending on service and protocol, this commonly returns HTTP 400 or HTTP 429.
Last reviewed: February 12, 2026|Editorial standard: source-backed technical guidance
What Does Throttling Exception Mean?
The service is actively protecting itself from request pressure, so calls are rate-limited until client demand, retry behavior, and quota headroom return to a sustainable level.
Common Causes
- -Burst concurrency exceeds per-operation or per-account request quotas.
- -Retry logic amplifies load because backoff and jitter are missing or too aggressive.
- -Traffic concentrates on hot resources or partitions, saturating localized limits.
- -Planned traffic growth outpaced approved quota increases or regional capacity settings.
How to Fix Throttling Exception
- 1Implement exponential backoff with full jitter and cap total retry attempts.
- 2Throttle client concurrency at the caller to smooth request bursts.
- 3Honor service guidance such as Retry-After headers when provided.
- 4Request quota increases for sustained demand above current service limits.
Step-by-Step Diagnosis for Throttling Exception
- 1Inspect CloudWatch metrics for throttled requests, latency, and burst concurrency.
- 2Break down failures by API action, region, and principal to isolate bottlenecks.
- 3Trace retry fan-out in clients and queues to identify self-induced traffic storms.
- 4Correlate throttling spikes with deploys, backfills, and autoscaling transitions.
Demand and Burst Profiling
- -Profile request burst shape by operation and principal (example: one queue consumer shard spikes `GetItem` at 10x baseline).
- -Inspect partition-level hot spots and uneven key distribution (example: DynamoDB traffic concentrates on a small key range).
Retry and Backpressure Controls
- -Audit retry fan-out and jitter quality in every client path (example: synchronized retries at 1s intervals create periodic throttle waves).
- -Verify caller-side concurrency guards and queue drain limits (example: autoscaler doubles workers without per-worker API token bucket).
How to Verify the Fix
- -Confirm throttled request count drops and success rate stabilizes under expected load.
- -Validate p95/p99 latency recovers without introducing queue backlogs.
- -Re-run load tests to ensure request patterns stay within known quota headroom.
How to Prevent Recurrence
- -Use adaptive rate limiting and token-bucket controls in all high-volume clients.
- -Continuously monitor quota headroom and auto-open increase requests before saturation.
- -Design retry-safe idempotent write paths to avoid duplicate side effects under throttling.
Pro Tip
- -reserve a fixed percentage of quota headroom for incident traffic so failover/backfill events do not immediately saturate limits.
Decision Support
Compare Guide
AWS ThrottlingException vs GCP RESOURCE_EXHAUSTED
Compare AWS ThrottlingException and GCP RESOURCE_EXHAUSTED to separate rate limiting from quota/resource exhaustion and choose the remediation path.
Compare Guide
429 Too Many Requests vs 503 Service Unavailable
Use 429 for caller-specific throttling and 503 for service-wide outages, so retry behavior, escalation paths, and incident ownership stay correct.
Playbook
Rate Limit Recovery Playbook (429 / ThrottlingException / RESOURCE_EXHAUSTED)
Use this playbook to separate transient throttling from hard quota exhaustion and apply retry, traffic-shaping, and quota-capacity fixes safely.
Official References
Provider Context
This guidance is specific to AWS services. Always validate implementation details against official provider documentation before deploying to production.