ServiceUnavailable
AWS ServiceUnavailable (Service Unavailable) means Amazon S3 is temporarily unable to handle the request. In Amazon S3, this error returns HTTP 503.
Last reviewed: February 12, 2026|Editorial standard: source-backed technical guidance
What Does Service Unavailable Mean?
When ServiceUnavailable is returned, the service cannot process the request in that moment, so user-facing operations can fail transiently until backend health stabilizes.
Common Causes
- -Transient S3 service-side disruption or partial regional degradation.
- -Sudden traffic bursts exceed what the current request pattern can sustain.
- -Client retries are synchronized and amplify request pressure during recovery windows.
- -Dependency layer issues (DNS/network/egress) make healthy endpoints appear unavailable.
How to Fix Service Unavailable
- 1Retry with exponential backoff and jitter while keeping total retry budget bounded.
- 2Throttle non-critical workloads to reduce load during recovery periods.
- 3Ensure write operations are idempotent before enabling automatic retries.
- 4Check AWS Health and service dashboards for ongoing incident context.
Step-by-Step Diagnosis for Service Unavailable
- 1Capture request IDs and map failures by region, endpoint, and operation type.
- 2Correlate 503 spikes with deploy events, traffic bursts, and upstream network errors.
- 3Inspect retry telemetry for synchronized retries or retry storm patterns.
- 4Verify DNS resolution, TLS handshake success, and egress reliability for callers.
Availability Checks
- -Inspect service health timelines and incident windows around the failure burst (example: S3 `ServiceUnavailable` spikes align with a temporary regional degradation event).
- -Trace endpoint reachability and TLS/DNS behavior from callers (example: intermittent resolver failures make healthy endpoints appear unavailable).
Resilience and Retry Validation
- -Audit retry spreading and jitter quality under failure load (example: synchronized retries amplify 503 waves instead of allowing recovery).
- -Verify fallback behavior for critical paths (example: non-critical traffic is throttled while essential writes continue with bounded retries).
How to Verify the Fix
- -Confirm 503 rates drop and successful request ratio returns to baseline.
- -Verify p95/p99 latency stabilizes after retry and traffic controls are applied.
- -Run controlled failover/retry tests to validate resilient behavior under stress.
How to Prevent Recurrence
- -Implement client-side backpressure, circuit breaking, and jittered retries by default.
- -Distribute traffic and critical data paths to reduce single-region recovery risk.
- -Continuously test incident response playbooks with game-day availability drills.
Pro Tip
- -persist x-amz-request-id and x-amz-id-2 with operation context so recurring S3 failures can be escalated with precise trace evidence.
Decision Support
Compare Guide
429 Too Many Requests vs 503 Service Unavailable
Use 429 for caller-specific throttling and 503 for service-wide outages, so retry behavior, escalation paths, and incident ownership stay correct.
Compare Guide
500 Internal Server Error vs 502 Bad Gateway: Root Cause
Debug 500 vs 502 faster: use 500 for origin failures and 502 for invalid upstream responses at gateways, then route incidents to the right team.
Playbook
API Timeout Playbook (502 / 504 / DEADLINE_EXCEEDED)
Use this playbook to separate invalid upstream responses from upstream wait expiration and deadline exhaustion, and apply timeout budgets, safe retries, and circuit-breaker controls safely.
Playbook
Availability and Dependency Playbook (500 / 503 / ServiceUnavailable)
Use this playbook to separate origin-side 500 failures from temporary 503 dependency or capacity outages, then apply safe retry and escalation paths.
Official References
Provider Context
This guidance is specific to AWS services. Always validate implementation details against official provider documentation before deploying to production.