503 - Service Unavailable
HTTP 503 Service Unavailable means the server is temporarily unable to handle the request due to overload or maintenance.
Last reviewed: April 15, 2026|Source-backed guidance under our editorial policy
Start Here
Use the closest compare guide, playbook, or adjacent error page to narrow the decision faster before you start changing production systems.
This page is part of the Error Reference library. Learn more about the project or report a correction.
What Does Service Unavailable Mean?
The service is intentionally or effectively unable to serve traffic right now. 503 is the right code for temporary unavailability, which means the response should point you toward overload, maintenance gating, readiness controls, or dependency recovery rather than malformed requests. If this is healthy behavior, you should usually see backpressure or recovery signals such as Retry-After, shed-load reasoning, or readiness transitions nearby.
Common Causes
- -Kubernetes deployment has zero ready pods during maintenance window, leaving ingress without healthy endpoints.
- -Readiness or startup gating stays too strict after deploy, so healthy-enough pods never rejoin service and ingress keeps serving 503.
- -Primary datastore failover keeps service temporarily unavailable while leader election and replay complete.
- -Circuit breaker opens after downstream latency spike and gateway serves temporary unavailable until recovery threshold.
- -Autoscaling reacts too slowly to a sudden burst, so concurrency, queue, or connection budgets are exhausted before new capacity arrives.
- -Maintenance or feature flags stay enabled after rollout and keep serving 503 to healthy traffic.
How to Fix Service Unavailable
- 1Reduce load immediately with admission control, queueing, or traffic shedding while preserving core paths.
- 2Scale constrained resources and recover failing dependencies before reopening full traffic.
- 3Serve cached, degraded, or read-only responses for non-critical paths while critical dependencies recover.
- 4If applicable, send
Retry-Afterso clients back off predictably during temporary unavailability. - 5Restore readiness for the affected tier before turning traffic back up to full volume.
Step-by-Step Diagnosis for Service Unavailable
- 1Correlate 503 spikes with saturation metrics (CPU, memory, queue depth, thread pools, connection limits).
- 2Identify whether outage source is overload, planned maintenance gating, or hard dependency outage.
- 3Inspect autoscaling and load-balancing behavior for lag or misconfiguration under traffic bursts.
- 4Inspect Retry-After headers, edge-generated maintenance responses, and readiness transitions to confirm the 503 is intentional and temporary.
- 5Check whether readiness, maintenance flags, or circuit breakers are intentionally withholding traffic from otherwise healthy instances.
- 6Retest after recovery actions and confirm traffic can ramp without re-entering overload conditions.
Seen in Production
- -Checkout service returns 503 during a promo launch because warm capacity is too low and autoscaling adds pods several minutes after the queue is already saturated.
- -Region remains in maintenance mode after a successful deploy because a feature flag rollback step never ran.
- -Dependency failover opens the circuit breaker and the API intentionally returns 503 with Retry-After while downstream recovery completes.
Capacity Saturation and Backpressure Analysis
- -Inspect resource ceilings and queue pressure (example: request queue depth hits max and service starts returning 503).
- -Validate autoscaler reaction time and policy thresholds (example: scale-out triggers too late for the actual burst profile).
Maintenance and Dependency Availability Checks
- -Audit maintenance flags and circuit breakers (example: service is left in maintenance mode after deployment).
- -Trace critical dependency readiness (example: cache or datastore failover causes app to reject requests with temporary 503).
Retry-After and Readiness Signal Audit
- -Check whether the response includes Retry-After or an explicit shed-load reason (example: edge returns plain 503 with no backoff hint, causing clients to amplify the outage with synchronized retries).
- -Trace readiness transitions and probe history (example: pods are healthy enough to serve but remain out of rotation because warm-up or dependency probe thresholds are too strict).
Decision Shortcut: Overload vs Maintenance vs Dependency Gating
- -If saturation metrics climb before 503 appears, prioritize capacity, concurrency limits, and retry suppression.
- -If 503 begins right after deploy or failover while utilization is normal, inspect maintenance flags, readiness rules, and dependency recovery gates first.
Wrong Fix to Avoid
- -Do not just increase client retries if the service is already overloaded; synchronized retries often turn a short incident into a longer 503 storm.
- -Do not reopen full traffic immediately after partial recovery without checking readiness, queue depth, and dependency health.
- -Do not force pods ready or disable health probes just to make the 503 disappear; that usually shifts the same failure into 500 or 504 on live traffic.
Implementation Examples
HTTP/1.1 503 Service Unavailable
Retry-After: 30
X-Request-Id: req_2f61d1
Content-Type: application/json
{"error":"Service Unavailable","reason":"queue_saturated"}kubectl get deploy api -n prod
kubectl get hpa api -n prod
kubectl top pods -n prod
kubectl get events -n prod --sort-by=.lastTimestamp | tail -20curl -i https://api.example.com/checkout | rg 'HTTP/|Retry-After|X-Request-Id'
kubectl get endpoints api -n prod
kubectl logs deploy/api -n prod --since=10m | rg 'shed|overload|maintenance|circuit'{
"requestId": "req_2f61d1",
"status": 503,
"retryAfterSeconds": 30,
"readinessBlockedBy": "database_failover",
"queueDepth": 1824,
"shedLoadMode": "payments_degraded"
}[
{
"ts": "2026-04-15T18:02:11Z",
"status": 503,
"retryAfterSeconds": 30,
"queueDepth": 1824,
"readyEndpoints": 2
},
{
"ts": "2026-04-15T18:05:48Z",
"action": "optional_traffic_shed",
"queueDepth": 611,
"readyEndpoints": 4
},
{
"ts": "2026-04-15T18:12:34Z",
"status503RatePerMin": 0,
"status500RatePerMin": 0,
"status504RatePerMin": 0,
"readyEndpoints": 8
}
]Incident Timeline
18:00 UTC
Pressure starts before the outage wall becomes obvious
Signal: Traffic ramps or a deploy/failover begins while queue depth, ready endpoints, or dependency health drift away from steady state.
Why it matters: The first useful clue usually appears in saturation or readiness telemetry before customers see a flat 503 wall.
18:02 UTC
The edge starts returning temporary-unavailable responses
Signal: Ingress emits 503 Service Unavailable, often with Retry-After, maintenance markers, queue_saturated, or shrinking endpoint counts.
Why it matters: This is where you validate the service contract directly. Temporary unavailability should come with backpressure or readiness clues nearby.
18:05 UTC
Mitigation reduces blast radius without reopening everything
Signal: Optional traffic is shed, clients back off, and capacity or dependency recovery begins while core routes are preserved.
Why it matters: Disciplined admission control is the turning point. Partial recovery only helps if it does not immediately refill the queue and recreate the 503 wall.
18:12 UTC
Readiness reopens and status-code drift stays flat
Signal: Ready endpoint count stabilizes, queue depth falls, and 503 disappears without turning the same path into 500 or 504.
Why it matters: A real fix restores steady-state serving behavior. A cosmetic fix just moves the outage to another status code.
Seen in Production
Traffic surge exceeds warm capacity before autoscaling catches up
Frequency: common
Example: Promotional event triples checkout traffic in minutes; service hits thread/connection limits and returns 503.
Fix: Pre-warm capacity for forecasted events and enforce adaptive concurrency limits with queue-based smoothing.
Maintenance mode unintentionally left enabled
Frequency: common
Example: Post-maintenance deploy succeeds, but feature flag keeps edge returning 503 for one region.
Fix: Automate maintenance-flag rollback checks and add post-deploy synthetic verification before reopening traffic.
Dependency failover keeps readiness closed longer than expected
Frequency: medium
Example: Database leader election completes, but application readiness remains false while connection pools and caches repopulate, so clients continue seeing 503.
Fix: Tune readiness behavior for dependency recovery and test controlled failover ramps before reopening traffic.
Rollout drains too much capacity at once
Frequency: medium
Example: Pod disruption budget or deployment strategy allows too many instances to rotate simultaneously, leaving too little warm capacity and causing a brief 503 wall.
Fix: Tighten rollout surge and unavailable settings, and validate minimum ready capacity during deploy windows.
Readiness never reopens after dependency recovery
Frequency: medium
Example: Database failover is complete, but the app remains out of rotation because startup or readiness checks still expect a warm cache state that has not repopulated yet.
Fix: Tune readiness criteria for recovery mode and validate re-entry behavior during controlled failover drills.
Wrong Fix vs Better Fix
Retry harder vs shape demand and honor Retry-After
Wrong fix: Increase client retry counts or remove backoff because the outage is temporary and “the service will probably come back.”
Better fix: Keep retries bounded, jittered, and limited to safe/idempotent paths while respecting Retry-After and shedding optional traffic.
Why this is better: Unbounded retries turn a short overload event into a longer 503 storm. Demand shaping gives the service room to recover.
Force readiness green vs repair the actual gate
Wrong fix: Disable health probes or mark pods ready manually so dashboards stop showing 503.
Better fix: Trace the gate that is withholding traffic, then reopen gradually after fixing readiness criteria, dependency recovery, or maintenance flags.
Why this is better: Forcing traffic into an unhealthy tier often converts a temporary 503 into 500 or 504 on real user requests.
Blind scale-out vs classifying the outage first
Wrong fix: Scale every tier immediately without deciding whether the incident is overload, maintenance gating, or dependency failover.
Better fix: Use queue depth, endpoint readiness, maintenance markers, and dependency health to classify the incident before adding capacity.
Why this is better: Capacity helps overload, but it does not clear a stuck maintenance flag or a dependency gate that is intentionally refusing traffic.
Debugging Tools
- -Saturation dashboards (CPU, memory, queue depth, worker pools)
- -Autoscaler event timeline and scaling policy logs
- -Dependency readiness/health probes
- -Retry-After header and maintenance-response telemetry
- -Load-shedding and admission-control telemetry
How to Verify the Fix
- -Repeat affected workflows and confirm 503 clears under nominal and burst traffic profiles.
- -Validate
Retry-Afterbehavior and client backoff compliance during controlled degradation tests. - -Confirm 503 rate falls without simply shifting the same path into 500 or 504 during recovery.
- -Confirm saturation metrics, ready endpoint count, and queue depth stay below alert thresholds after capacity and dependency fixes.
- -Check that readiness and maintenance controls return to the expected steady-state after deploy or failover.
How to Prevent Recurrence
- -Separate critical and non-critical traffic classes so overload sheds optional work before user-facing paths.
- -Improve resilience with capacity planning, graceful degradation, and circuit breakers.
- -Gate high-risk deployments with canaries and automatic rollback triggers.
- -Continuously test dependency failure paths and production recovery procedures.
Pro Tip
- -define service-level admission budgets per endpoint tier so non-critical traffic is shed first and core APIs stay available under pressure.
Decision Support
Compare Guide
429 Too Many Requests vs 503 Service Unavailable
Use 429 for caller-specific throttling and 503 for service-wide outages, so retry behavior, escalation paths, and incident ownership stay correct.
Playbook
Availability and Dependency Playbook (500 / 503 / ServiceUnavailable)
Use this playbook to separate origin-side 500 failures from temporary 503 dependency or capacity outages, then apply safe retry and escalation paths.
Official References
Provider Context
This guidance is specific to HTTP services. Always validate implementation details against official provider documentation before deploying to production.