504 - Gateway Timeout
HTTP 504 Gateway Timeout means a gateway or proxy did not receive a timely response from an upstream server.
Last reviewed: April 15, 2026|Source-backed guidance under our editorial policy
Start Here
Use the closest compare guide, playbook, or adjacent error page to narrow the decision faster before you start changing production systems.
This page is part of the Error Reference library. Learn more about the project or report a correction.
What Does Gateway Timeout Mean?
The gateway waited longer than its deadline for an upstream response, so the request timed out at the proxy boundary. The real problem is usually an upstream tail-latency, queueing, or timeout-budget mismatch rather than an instantly broken origin.
Common Causes
- -Gateway timeout budget is 30 seconds while downstream report job regularly needs 45 seconds to finish.
- -VPC DNS latency plus cold start delay exceeds load balancer idle timeout before first upstream byte returns.
- -Service chain lacks propagated deadline and waits on third-party API indefinitely until edge timer expires.
- -Database or connection pool contention delays request execution so long that the gateway deadline fires before work really starts.
- -Retry fan-out or dependency brownout inflates p95 and p99 latency until the proxy times out first.
How to Fix Gateway Timeout
- 1Identify the slow upstream hop and reduce tail latency (query optimization, pool tuning, dependency fan-out reduction).
- 2Align timeout ladder across client, gateway, and upstream so deadlines are monotonic and intentional.
- 3Move long-running operations to async workflows instead of blocking gateway request deadlines.
- 4Apply bounded failover or circuit-breaking so one degraded dependency cannot consume the full timeout budget for every request.
Step-by-Step Diagnosis for Gateway Timeout
- 1Capture end-to-end trace timings and isolate which upstream span exceeds gateway timeout.
- 2Compare timeout settings across all hops and detect cases where intermediary deadlines are shorter than upstream work.
- 3Inspect queueing, thread pool wait, and connection pool contention on slow upstream services.
- 4Check whether retries or downstream fan-out consume most of the deadline before the final upstream hop begins useful work.
- 5Retest with controlled load and adjusted timeout budgets to confirm the timeout source is removed.
Seen in Production
- -Report endpoint returns 504 after a deploy because the new aggregation query pushes p99 latency above the load balancer 30-second idle timeout.
- -Gateway timeout spikes only during cold starts because DNS lookup, TLS handshake, and dependency bootstrap consume most of the deadline before app code runs.
- -Third-party API slowdown causes the service to wait indefinitely on one dependency, and the edge proxy times out first with 504.
Timeout Ladder and Deadline Propagation Audit
- -Verify a monotonic timeout chain (example: gateway 30s timeout while upstream endpoint regularly needs 45s).
- -Check deadline propagation headers or context (example: upstream ignores client deadline and runs until proxy timeout fires).
Upstream Tail Latency and Queue Contention Analysis
- -Inspect high-percentile latency contributors (example: database query regression pushes p99 beyond gateway timeout).
- -Trace resource contention effects (example: connection pool starvation causes long wait before request execution starts).
Decision Shortcut: Slow Work vs Bad Timeout Budget
- -If direct origin calls are fast but the edge still times out, inspect timeout ladder drift and intermediary deadlines before tuning queries.
- -If traces show one span burning almost the entire budget, prioritize tail-latency reduction or async redesign over simply raising gateway limits.
Wrong Fix to Avoid
- -Do not only raise gateway timeouts if the real issue is an avoidable upstream latency regression or unbounded dependency wait.
- -Do not classify every 504 as generic network flakiness when trace data shows one hop consistently burning the entire budget.
Implementation Examples
2026-04-13T11:26:04Z edge=request-gateway requestId=req_90bc2a upstream=reporting-api
upstream_wait_ms=30001 timeout_budget_ms=30000 status=504
message="upstream request timed out while reading response header"proxy_connect_timeout 3s;
proxy_send_timeout 15s;
proxy_read_timeout 15s;
send_timeout 15s;{
"requestId": "req_90bc2a",
"totalMs": 30021,
"spans": {
"gateway_to_app": 14,
"app_queue_wait": 912,
"db_query": 27104,
"response_write": 7
}
}Incident Timeline
11:24 UTC
Latency rises before the proxy actually times out
Signal: Queue wait, cold-start delay, or one upstream span begins consuming a larger share of the deadline while the route still occasionally succeeds.
Why it matters: The earliest useful signal is usually budget burn, not the 504 itself. This is where you decide whether the problem is slow work or a broken timeout ladder.
11:26 UTC
The edge deadline expires before useful work finishes
Signal: Gateway logs show upstream request timed out while reading response header after one span or queue phase consumes almost the full budget.
Why it matters: By the time 504 appears, the request already lost the deadline race deeper in the chain.
11:30 UTC
Longer timeouts alone make the user wait longer
Signal: Raising one gateway timeout shifts the symptom but leaves p99 latency and dependency wait largely unchanged.
Why it matters: If the underlying slow hop stays slow, longer budgets usually delay failure rather than create a healthy path.
11:38 UTC
Budget alignment or async redesign removes the deadline burn
Signal: The slow span is reduced, retries are bounded, or the workflow moves async, and the same path completes comfortably inside the ladder.
Why it matters: That confirms the real fix lives in tail-latency reduction or budget design, not in cosmetic timeout inflation.
Seen in Production
Long-running sync workflow exceeds edge deadline
Frequency: common
Example: Gateway timeout budget is shorter than the real duration of a report or export path, so 504 appears even though origin eventually finishes.
Fix: Move the workflow to async processing or redesign the path so the client polls job status instead of holding the edge connection open.
Upstream latency spike from resource contention
Frequency: common
Example: Database pool starvation or downstream queueing pushes p99 above the load balancer timeout during the busiest traffic window.
Fix: Tune pool sizes, queue admission, and hot-path queries so tail latency stays inside the timeout budget.
Third-party dependency ignores propagated deadlines
Frequency: medium
Example: Service waits too long on one payment or partner API call, and the gateway times out before the upstream finally gives up.
Fix: Propagate deadlines, bound dependency call time, and fail fast with a safer fallback or clearer 503 when capacity is temporarily unavailable.
Retry fan-out burns the timeout budget before useful work begins
Frequency: medium
Example: App retries a slow dependency several times inside one request, so the edge sees only a 504 even though the first timeout happened deep in the dependency chain.
Fix: Bound internal retries, propagate a single deadline budget, and surface dependency timeout telemetry before the gateway deadline is exhausted.
Wrong Fix vs Better Fix
Raise gateway timeout vs isolate the slow hop
Wrong fix: Only increase the edge timeout because the route almost finishes and “just needs more time.”
Better fix: Use traces and queue metrics to isolate which hop is burning the deadline, then reduce that latency or redesign the workflow.
Why this is better: A larger timeout often just makes users wait longer while the same upstream bottleneck remains unresolved.
Retry more inside the request vs preserve one deadline budget
Wrong fix: Add more internal retries or fan-out within the same request path to push through the slowdown.
Better fix: Bound internal retries, propagate one deadline budget, and fail fast or degrade once the request no longer has enough time left to succeed safely.
Why this is better: Extra retries often spend the remaining budget before useful work even starts, turning one slow dependency into a guaranteed 504.
Treat as generic network noise vs inspect the timeout ladder
Wrong fix: Classify the incident as random flakiness without comparing client, gateway, and upstream deadlines.
Better fix: Audit the full timeout ladder and confirm each hop has an intentional, monotonic budget that matches the real work profile.
Why this is better: Many 504s are configuration problems in deadline design, not mysterious network events.
Debugging Tools
- -Distributed tracing with span-level timing
- -Gateway timeout logs with upstream target tags
- -Queue and connection-pool contention metrics
- -Latency percentile dashboards by dependency
- -Timeout-budget config diffs across edge and origin layers
How to Verify the Fix
- -Re-run affected flows and confirm upstream responses complete before gateway deadlines.
- -Validate timeout and retry behavior under both nominal and burst traffic conditions.
- -Monitor p95/p99 latency plus 504 rate to ensure sustained recovery over time.
- -Check that timeout ladders remain consistent across config repositories and deployed gateways.
How to Prevent Recurrence
- -Standardize timeout budgets and retry policies across all service boundaries.
- -Continuously probe DNS, TLS, and upstream health paths with synthetic checks.
- -Implement graceful degradation and circuit-breaking for upstream latency spikes.
Pro Tip
- -enforce timeout-budget contracts in CI so gateway and service deadlines cannot drift apart across config repositories.
Decision Support
Compare Guide
502 Bad Gateway vs 504 Gateway Timeout: Key Differences
Fix upstream errors faster: use 502 when a gateway gets an invalid upstream response, and 504 when the upstream service exceeds your timeout budget.
Playbook
API Timeout Playbook (502 / 504 / DEADLINE_EXCEEDED)
Use this playbook to separate invalid upstream responses from upstream wait expiration and deadline exhaustion, and apply timeout budgets, safe retries, and circuit-breaker controls safely.
Official References
Provider Context
This guidance is specific to HTTP services. Always validate implementation details against official provider documentation before deploying to production.