v0.7.0 — token-burn: System-wide prompt-cache discipline — stable prefix, volatile suffix, measured (#2918) · Issues · Trust My Bot / plugin

v0.7.0 — token-burn: System-wide prompt-cache discipline — stable prefix, volatile suffix, measured

## Problem TMB does not structure its prompts to maximize CC's prompt-cache hit-rate. The cache breakpoints (per Anthropic) anchor from the start of the assembled prompt and break at the first byte-difference. TMB violates the cache-friendly contract in multiple places. ## Evidence (verified) 1. **SessionStart hook prints volatile counts before stable inventory** — `session-start-prescan.sh` emits branch/commit count/dirty count/open issues/pending tasks above the top-level dirs and stacks-detected lines. Covered tactically by #2908. 2. **UserPromptSubmit hooks** inject pending-issue banner on every bro-mode turn — addressed by #2908. 3. **Skill load order is description-match-driven, not stability-sorted** — when CC composes the prompt for a bro turn, the order of skill bodies depends on which descriptions match, which depends on the user's prompt text. Different prompts → different orderings → cache breaks. 4. **Platform memory files**: - `CLAUDE.md` (51 LOC) — actively loaded by CC - `CODEX.md` (21 LOC), `CURSOR.md` (18 LOC), `GEMINI.md` (19 LOC) — **all marked as placeholders** ("not implemented"; `gemini-extension.json` confirms placeholder status). These do NOT load into CC sessions; they're forward-compat stubs for other platforms. 5. **MEMORY.md updates frequently** (memory edits per user feedback) — if CC loads user-memory in the prompt, edits to MEMORY.md may live in a volatile region and bust the cache below them. ## Plan 1. **Establish a documented "cache zone" structure** in `docs/contributing/ARCHITECTURE.md`: - Zone A (top, stable): plugin identity (CLAUDE.md), MCP tool index (deferred names + descriptions — stable across sessions) - Zone B (middle, semi-stable): skill bodies, sorted by stability tier - Zone C (bottom, volatile): hook output (counts, banners, timestamps), user memory updates 2. **Stability-tier annotation** in each `tmb_*/SKILL.md` frontmatter: `cache_tier: stable | semi-stable | volatile`. Reorder loading by tier. 3. **Mark hooks as `cache_zone: volatile`** in `hooks/hooks.json` (or a sibling metadata file) — their output always tails. 4. **Add `tests/benchmarks/prompt-cache-stability.sh`** that runs two identical sessions, diffs the cold-prompt output, asserts ≥80% shared prefix. 5. **Document that CODEX.md / CURSOR.md / GEMINI.md are placeholders** (and not currently loaded by any harness) so contributors don't add content thinking it loads into CC. ## Acceptance criteria - Cold-prompt diff between two identical turns has ≥80% shared prefix (measured by `tests/benchmarks/prompt-cache-stability.sh`). - New benchmark recorded in CI; regression breaks if shared prefix drops below 70%. - All `tmb_*` skills annotated with a `cache_tier` in their SKILL.md frontmatter. - `docs/contributing/ARCHITECTURE.md` cache-zone section exists. - L4 + L5 still pass. ## Coordination - Tactical hook reorder lives in #2908 (this issue is the strategic, system-wide version). - Skill slimming in #2904 reduces the cost of cache misses; this issue makes misses rarer. - Pairs with #2916 (compact responses) — both reduce per-turn churn. ## Out of scope - Anthropic-side cache tuning (we don't control it). - Pre-compiling bro's prompt to a binary cache key (would require CC platform support). ## Note on source Previous description listed CODEX.md/CURSOR.md/GEMINI.md as concrete platform memory files inflating cache footprint. Verified directly: all three files explicitly say "Status: not implemented." in their first paragraph. `gemini-extension.json` confirms `_status: placeholder`. So they are forward-compat stubs, not active load surface. Removed that claim.

issue