Add usage quota cache revalidation headers to consumers resolve endpoint

Problem

AI Gateway caches the response from CustomersDot's HEAD /api/v1/consumers/resolve endpoint for 1 hour (in-memory, per-instance). The consumption pipeline updates wallet balances at :10 past every hour. This creates a staleness window of up to ~55 minutes where the cached response no longer reflects the actual wallet state.

Two directions of staleness:

  1. False positive (cached 200, should be 402) — user continues using features after credits are depleted.
  2. False negative (cached 402, should be 200) — user is blocked but should not be (e.g., after accepting overage terms or receiving monthly credits).

Goals

  • Reduce cache staleness from ~55 minutes to ~2 minutes for wallet-dependent decisions
  • Set appropriate cache TTLs based on response type (stable bypass vs wallet-dependent vs blocked)
  • Use standard HTTP Cache-Control semantics
  • No new infrastructure required (no Redis, no pub/sub)

Non-goals

  • Changing usage quota decision logic
  • Shared caching in AI Gateway (no Redis available)
  • Real-time cache invalidation (no background poller on AI Gateway)

Constraints discovered

  • No Redis on AI Gateway — cache must be in-memory, per-instance
  • No background poller on AI Gateway — can only update cache within request/response cycle
  • 20+ operations with wildly different credit costs — cannot predict burn rate
  • No shared state between AI Gateway instances — cannot do centralized invalidation

Options evaluated

Option Staleness Complexity Verdict
Cache-Control: max-age aligned to consumption schedule ~2 min Low Selected
Push notification on wallet depletion ~0 High No shared state to invalidate
Server-side Redis cache + invalidation ~0 Medium No Redis on AI Gateway
Precomputed usage_allowed flag on consumer ~0 Medium-High Good long-term, needs all write paths updated
Return balance in headers + smart client TTL Variable Medium 20+ ops with different costs makes this impractical
Changes/invalidation polling endpoint 30-60s Medium No background poller on AI Gateway
Shared Redis cache (AI Gateway) ~0 Medium No Redis on AI Gateway
ETag/Last-Modified (original proposal) ~0 per request Medium AI Gateway does not support conditional requests

Implemented approach

Server returns Cache-Control: max-age=N with TTL aligned to the consumption pipeline schedule:

Response Reason max-age
200 Stable bypass (team member, proxy, feature disabled) 3600s (1 hour)
200 Wallet-dependent (via resolve!) Seconds until :12 past next hour (60-3600s)
402 Blocked (no credits, no overage) 300s (5 min) — user could accept overage anytime
403/422 Error 60s

The :12 offset = consumption cron at :10 + 2 min buffer for job completion.

New PORO: Billing::Usage::CacheTtlCalculator encapsulates the TTL computation with YARD documentation.

Uses Rails built-in expires_in from ActionController::ConditionalGet.

What this achieves

  • Max staleness drops from ~55 min to ~2 min for wallet-dependent decisions
  • Backward-compatible — the header is currently ignored by AI Gateway
  • No new infrastructure required
  • Server owns the schedule knowledge — no fragile coupling
Edited by Vitaly Slobodin