Operational Playbooks

Operational Playbooks provides cross-error incident runbooks for recurring production failures such as API timeouts, authentication breakdowns, CORS policy mismatches, and rate-limit recovery. Each playbook lays out triage checkpoints, containment actions, verification criteria, and prevention controls in execution order. Use these guides during active response windows when speed and correctness both matter.

Showing 1-10 of 10.

API Timeout Playbook (502 / 504 / DEADLINE_EXCEEDED)

Use this playbook to separate invalid upstream responses from upstream wait expiration and deadline exhaustion, and apply timeout budgets, safe retries, and circuit-breaker controls safely.

Availability and Dependency Playbook (500 / 503 / ServiceUnavailable)

Use this playbook to separate origin-side 500 failures from temporary 503 dependency or capacity outages, then apply safe retry and escalation paths.

Authorization Denial Playbook (403 / AccessDenied / PERMISSION_DENIED)

Use this playbook to triage policy-based access denials after authentication succeeds, isolate the deny layer, and apply least-privilege remediation safely.

Auth Incident Playbook (401 / UNAUTHENTICATED)

Use this playbook to separate missing, expired, or invalid identity proof from authorization and transport failures, and apply credential-source-correct fixes safely.

Conflict and Concurrency Playbook (409 / 412 / OptimisticLock)

Use this playbook to separate true write conflicts from stale precondition failures, then apply safe re-fetch, optimistic-lock, and retry choices.

CORS Error Fix Playbook (Preflight / Origin / Credentials)

Use this playbook to separate browser-enforced cross-origin policy failures from server-side CORS header and route defects and apply strict origin and credential controls safely.

Rate Limit Recovery Playbook (429 / ThrottlingException / RESOURCE_EXHAUSTED)

Use this playbook to separate transient throttling from hard quota exhaustion and apply retry, traffic-shaping, and quota-capacity fixes safely.

Resource State Playbook (404 / 410 / ResourceNotFound)

Use this playbook to separate temporary missing-resource lookups from permanent removals, then fix scope, lifecycle, and identifier drift safely.

Unknown and Unclassified Error Playbook (500 / UNKNOWN / InternalError)

Triage 500, gRPC UNKNOWN, and cloud InternalError fast: preserve correlation IDs, separate transient provider faults from app bugs, and apply safe retries.

Validation Failure Playbook (400 / 422 / INVALID_ARGUMENT)

Use this playbook to separate malformed-request failures from semantic validation failures, then fix request contracts without broad server-side bypasses.