v0.7.0 — token-burn: System-wide prompt-cache discipline — stable prefix, volatile suffix, measured
## Problem
TMB does not structure its prompts to maximize CC's prompt-cache hit-rate. The cache breakpoints (per Anthropic) anchor from the start of the assembled prompt and break at the first byte-difference. TMB violates the cache-friendly contract in multiple places.
## Evidence (verified)
1. **SessionStart hook prints volatile counts before stable inventory** — `session-start-prescan.sh` emits branch/commit count/dirty count/open issues/pending tasks above the top-level dirs and stacks-detected lines. Covered tactically by #2908.
2. **UserPromptSubmit hooks** inject pending-issue banner on every bro-mode turn — addressed by #2908.
3. **Skill load order is description-match-driven, not stability-sorted** — when CC composes the prompt for a bro turn, the order of skill bodies depends on which descriptions match, which depends on the user's prompt text. Different prompts → different orderings → cache breaks.
4. **Platform memory files**:
- `CLAUDE.md` (51 LOC) — actively loaded by CC
- `CODEX.md` (21 LOC), `CURSOR.md` (18 LOC), `GEMINI.md` (19 LOC) — **all marked as placeholders** ("not implemented"; `gemini-extension.json` confirms placeholder status). These do NOT load into CC sessions; they're forward-compat stubs for other platforms.
5. **MEMORY.md updates frequently** (memory edits per user feedback) — if CC loads user-memory in the prompt, edits to MEMORY.md may live in a volatile region and bust the cache below them.
## Plan
1. **Establish a documented "cache zone" structure** in `docs/contributing/ARCHITECTURE.md`:
- Zone A (top, stable): plugin identity (CLAUDE.md), MCP tool index (deferred names + descriptions — stable across sessions)
- Zone B (middle, semi-stable): skill bodies, sorted by stability tier
- Zone C (bottom, volatile): hook output (counts, banners, timestamps), user memory updates
2. **Stability-tier annotation** in each `tmb_*/SKILL.md` frontmatter: `cache_tier: stable | semi-stable | volatile`. Reorder loading by tier.
3. **Mark hooks as `cache_zone: volatile`** in `hooks/hooks.json` (or a sibling metadata file) — their output always tails.
4. **Add `tests/benchmarks/prompt-cache-stability.sh`** that runs two identical sessions, diffs the cold-prompt output, asserts ≥80% shared prefix.
5. **Document that CODEX.md / CURSOR.md / GEMINI.md are placeholders** (and not currently loaded by any harness) so contributors don't add content thinking it loads into CC.
## Acceptance criteria
- Cold-prompt diff between two identical turns has ≥80% shared prefix (measured by `tests/benchmarks/prompt-cache-stability.sh`).
- New benchmark recorded in CI; regression breaks if shared prefix drops below 70%.
- All `tmb_*` skills annotated with a `cache_tier` in their SKILL.md frontmatter.
- `docs/contributing/ARCHITECTURE.md` cache-zone section exists.
- L4 + L5 still pass.
## Coordination
- Tactical hook reorder lives in #2908 (this issue is the strategic, system-wide version).
- Skill slimming in #2904 reduces the cost of cache misses; this issue makes misses rarer.
- Pairs with #2916 (compact responses) — both reduce per-turn churn.
## Out of scope
- Anthropic-side cache tuning (we don't control it).
- Pre-compiling bro's prompt to a binary cache key (would require CC platform support).
## Note on source
Previous description listed CODEX.md/CURSOR.md/GEMINI.md as concrete platform memory files inflating cache footprint. Verified directly: all three files explicitly say "Status: not implemented." in their first paragraph. `gemini-extension.json` confirms `_status: placeholder`. So they are forward-compat stubs, not active load surface. Removed that claim.
issue