Add usage quota cache revalidation headers to consumers resolve endpoint
Problem
AI Gateway caches the response from CustomersDot's HEAD /api/v1/consumers/resolve endpoint for 1 hour (in-memory, per-instance). The consumption pipeline updates wallet balances at :10 past every hour. This creates a staleness window of up to ~55 minutes where the cached response no longer reflects the actual wallet state.
Two directions of staleness:
- False positive (cached 200, should be 402) — user continues using features after credits are depleted.
- False negative (cached 402, should be 200) — user is blocked but should not be (e.g., after accepting overage terms or receiving monthly credits).
Goals
- Reduce cache staleness from ~55 minutes to ~2 minutes for wallet-dependent decisions
- Set appropriate cache TTLs based on response type (stable bypass vs wallet-dependent vs blocked)
- Use standard HTTP
Cache-Controlsemantics - No new infrastructure required (no Redis, no pub/sub)
Non-goals
- Changing usage quota decision logic
- Shared caching in AI Gateway (no Redis available)
- Real-time cache invalidation (no background poller on AI Gateway)
Constraints discovered
- No Redis on AI Gateway — cache must be in-memory, per-instance
- No background poller on AI Gateway — can only update cache within request/response cycle
- 20+ operations with wildly different credit costs — cannot predict burn rate
- No shared state between AI Gateway instances — cannot do centralized invalidation
Options evaluated
| Option | Staleness | Complexity | Verdict |
|---|---|---|---|
Cache-Control: max-age aligned to consumption schedule |
~2 min | Low | Selected |
| Push notification on wallet depletion | ~0 | High | No shared state to invalidate |
| Server-side Redis cache + invalidation | ~0 | Medium | No Redis on AI Gateway |
Precomputed usage_allowed flag on consumer |
~0 | Medium-High | Good long-term, needs all write paths updated |
| Return balance in headers + smart client TTL | Variable | Medium | 20+ ops with different costs makes this impractical |
| Changes/invalidation polling endpoint | 30-60s | Medium | No background poller on AI Gateway |
| Shared Redis cache (AI Gateway) | ~0 | Medium | No Redis on AI Gateway |
ETag/Last-Modified (original proposal) |
~0 per request | Medium | AI Gateway does not support conditional requests |
Implemented approach
Server returns Cache-Control: max-age=N with TTL aligned to the consumption pipeline schedule:
| Response | Reason | max-age |
|---|---|---|
200 |
Stable bypass (team member, proxy, feature disabled) | 3600s (1 hour) |
200 |
Wallet-dependent (via resolve!) |
Seconds until :12 past next hour (60-3600s) |
402 |
Blocked (no credits, no overage) | 300s (5 min) — user could accept overage anytime |
403/422
|
Error | 60s |
The :12 offset = consumption cron at :10 + 2 min buffer for job completion.
New PORO: Billing::Usage::CacheTtlCalculator encapsulates the TTL computation with YARD documentation.
Uses Rails built-in expires_in from ActionController::ConditionalGet.
What this achieves
- Max staleness drops from ~55 min to ~2 min for wallet-dependent decisions
- Backward-compatible — the header is currently ignored by AI Gateway
- No new infrastructure required
- Server owns the schedule knowledge — no fragile coupling
Edited by Vitaly Slobodin