Add usage quota cache revalidation headers to consumers resolve endpoint

Problem

AI Gateway caches the response from CustomersDot's HEAD /api/v1/consumers/resolve endpoint for 1 hour (in-memory, per-instance). The consumption pipeline updates wallet balances at :10 past every hour. This creates a staleness window of up to ~55 minutes where the cached response no longer reflects the actual wallet state.

Two directions of staleness:

False positive (cached 200, should be 402) — user continues using features after credits are depleted.
False negative (cached 402, should be 200) — user is blocked but should not be (e.g., after accepting overage terms or receiving monthly credits).

Goals

Reduce cache staleness from ~55 minutes to ~2 minutes for wallet-dependent decisions
Set appropriate cache TTLs based on response type (stable bypass vs wallet-dependent vs blocked)
Use standard HTTP Cache-Control semantics
No new infrastructure required (no Redis, no pub/sub)

Non-goals

Changing usage quota decision logic
Shared caching in AI Gateway (no Redis available)
Real-time cache invalidation (no background poller on AI Gateway)

Constraints discovered

No Redis on AI Gateway — cache must be in-memory, per-instance
No background poller on AI Gateway — can only update cache within request/response cycle
20+ operations with wildly different credit costs — cannot predict burn rate
No shared state between AI Gateway instances — cannot do centralized invalidation

Options evaluated

Option	Staleness	Complexity	Verdict
`Cache-Control: max-age` aligned to consumption schedule	~2 min	Low	Selected
Push notification on wallet depletion	~0	High	No shared state to invalidate
Server-side Redis cache + invalidation	~0	Medium	No Redis on AI Gateway
Precomputed `usage_allowed` flag on consumer	~0	Medium-High	Good long-term, needs all write paths updated
Return balance in headers + smart client TTL	Variable	Medium	20+ ops with different costs makes this impractical
Changes/invalidation polling endpoint	30-60s	Medium	No background poller on AI Gateway
Shared Redis cache (AI Gateway)	~0	Medium	No Redis on AI Gateway
`ETag`/`Last-Modified` (original proposal)	~0 per request	Medium	AI Gateway does not support conditional requests

Implemented approach

Server returns Cache-Control: max-age=N with TTL aligned to the consumption pipeline schedule:

Response	Reason	max-age
`200`	Stable bypass (team member, proxy, feature disabled)	3600s (1 hour)
`200`	Wallet-dependent (via `resolve!`)	Seconds until `:12` past next hour (60-3600s)
`402`	Blocked (no credits, no overage)	300s (5 min) — user could accept overage anytime
`403`/`422`	Error	60s

The :12 offset = consumption cron at :10 + 2 min buffer for job completion.

New PORO: Billing::Usage::CacheTtlCalculator encapsulates the TTL computation with YARD documentation.

Uses Rails built-in expires_in from ActionController::ConditionalGet.

What this achieves

Max staleness drops from ~55 min to ~2 min for wallet-dependent decisions
Backward-compatible — the header is currently ignored by AI Gateway
No new infrastructure required
Server owns the schedule knowledge — no fragile coupling

Edited Feb 20, 2026 by Vitaly Slobodin