Triage 500, gRPC UNKNOWN, and cloud InternalError fast: preserve correlation IDs, separate transient provider faults from app bugs, and apply safe retries.
Last reviewed: February 23, 2026|Editorial standard: source-backed operational guidance
500 signals server-side failure in an HTTP boundary. UNKNOWN signals that gRPC could not map failure details into a more specific status. InternalError and similar provider codes signal backend processing faults in cloud control or data planes.
Retry only idempotent operations with bounded exponential backoff and jitter when the error pattern looks transient. Escalate when identical calls fail repeatedly with stable inputs and correlation IDs. Escalate immediately when failures span regions or services and customer impact grows.
Gateways and providers often suppress low-level exception detail to prevent leakage. Translation layers can collapse rich internal exceptions into generic UNKNOWN or InternalError envelopes. Strong correlation IDs and cross-layer logs restore the missing context.
Include one failing request sample, exact UTC timestamps, correlation IDs, affected scope, region, and provider request ID in the bundle. These fields let provider support align your incident to backend traces, shard ownership, and control-plane events without guessing context. Include the Pro Tip log fields (`request_id`, `trace_id`, `correlation_id`, `provider_code`, `grpc_status`, `http_status`, `operation`, `region`, `retry_attempt`, `idempotency_key`) so support can reproduce failure boundaries quickly. This structure shortens escalation loops and avoids repeated evidence requests.