Unavailable
AWS Unavailable (Unavailable) means the service is temporarily unavailable. In AWS APIs, this error returns HTTP 503.
Last reviewed: February 12, 2026|Editorial standard: source-backed technical guidance
What Does Unavailable Mean?
When Unavailable is returned, the service cannot process requests temporarily, so user operations fail until endpoint health recovers and retry/failover logic stabilizes traffic.
Common Causes
- -Service-side maintenance, incident, or transient control-plane disruption.
- -Regional dependency degradation causing intermittent request failures.
- -Burst traffic combined with aggressive retries amplifies transient instability.
- -Client-side networking or DNS path issues mimic service unavailability.
How to Fix Unavailable
- 1Retry with bounded exponential backoff plus jitter and strict retry budgets.
- 2Reduce non-critical traffic while preserving critical workflows.
- 3Check AWS Health plus service-specific dashboards for ongoing incidents.
- 4Fail over to alternate region/path where architecture supports it.
Step-by-Step Diagnosis for Unavailable
- 1Map failures by region, endpoint, and operation to isolate blast radius.
- 2Correlate outages with deploy windows, traffic spikes, and network changes.
- 3Inspect retry telemetry to detect retry storms or missing backpressure.
- 4Validate client DNS/TLS/network health to separate service from transport failures.
Availability Checks
- -Correlate failure windows with AWS Health events and service dashboards (example: temporary server-side failure periods produce clustered 503 responses).
- -Inspect endpoint reachability, DNS resolution, and TLS handshake integrity from callers (example: transport-layer instability mimics service outages).
Resilience Path Validation
- -Audit retry behavior for jitter and bounded budgets (example: synchronized retries create secondary waves during partial recovery).
- -Verify failover and graceful-degradation paths for critical operations (example: fallback region or queue buffering keeps essential workflows alive).
How to Verify the Fix
- -Confirm success rates and latency return to normal baselines.
- -Verify retries normalize and no longer dominate request volume.
- -Run controlled failover tests to validate recovery automation.
How to Prevent Recurrence
- -Build multi-region resilience for critical paths and state replication where feasible.
- -Apply circuit breakers, adaptive throttling, and backpressure across clients.
- -Practice incident and failover drills with production-like traffic patterns.
Pro Tip
- -define error-budget-aware retry caps so clients stop amplifying outages once service health drops below a threshold.
Decision Support
Compare Guide
429 Too Many Requests vs 503 Service Unavailable
Use 429 for caller-specific throttling and 503 for service-wide outages, so retry behavior, escalation paths, and incident ownership stay correct.
Compare Guide
500 Internal Server Error vs 502 Bad Gateway: Root Cause
Debug 500 vs 502 faster: use 500 for origin failures and 502 for invalid upstream responses at gateways, then route incidents to the right team.
Playbook
API Timeout Playbook (502 / 504 / DEADLINE_EXCEEDED)
Use this playbook to separate invalid upstream responses from upstream wait expiration and deadline exhaustion, and apply timeout budgets, safe retries, and circuit-breaker controls safely.
Playbook
Availability and Dependency Playbook (500 / 503 / ServiceUnavailable)
Use this playbook to separate origin-side 500 failures from temporary 503 dependency or capacity outages, then apply safe retry and escalation paths.
Official References
Provider Context
This guidance is specific to AWS services. Always validate implementation details against official provider documentation before deploying to production.