v0.7.0 — token-burn: Slim bro's always-on skill set (description-match scoping)
## Problem
Bro auto-loads multiple TMB skills on every Human turn via CC description-matching. Verified count: **8 skills in `plugin/skills/`**, totaling 1054 lines of SKILL.md content:
| Skill | Lines | Loads when (per description) |
|---|---|---|
| **tmb_planning** | **292** ⚠️ | First code-touching ask of a session |
| **tmb_review** | **274** ⚠️ | Push gate, PR-comment triage, review-before-push asks |
| tmb_agent-creator | 128 | Named-role consult ("get architect's read on X") |
| tmb_recovery | 127 | First failure of a session |
| tmb_skill-creator | 104 | "Create a skill that codifies <X>" |
| tmb_docs-conventions | 52 | Editing prompt files / docs-update expectation |
| tmb_concerns-protocol | 51 | When bro disagrees with the Human's plan |
| tmb_swe-checklist | 26 | SWE about to atomic-close |
Bro is **not** a file-defined agent — there is no `agents/bro.md`. Bro is the main CC session itself, shaped by `CLAUDE.md` (51 LOC) + skills above (loaded by CC description-matching).
**Two skills exceed the 200 LOC industry-standard ceiling for a single prompt unit** (per `docs/architecture/DETERMINISM.md` authoring checklist + general prompt-engineering practice). They are also the broadest-firing — `tmb_planning` fires on any code-touching ask; `tmb_review` fires on 4 distinct triggers.
## Evidence (verified by reading `plugin/skills/*/SKILL.md`)
- `skills/tmb_planning/SKILL.md` description ends "Self-contained — everything bro needs is here" — designed to fire on any code-touching ask, but its 292-line body lands in context unconditionally on match.
- `skills/tmb_review/SKILL.md` description fires for **four** distinct triggers: pr-reviewer scoring (subagent), bro on push-hook block, bro on "review-before-push" asks, bro on PR-comment monitoring. 274 lines per match.
- Well-scoped narrow descriptions (use as templates):
- `tmb_concerns-protocol`: "Loaded when bro genuinely disagrees with a request" — narrow + verb-specific.
- `tmb_swe-checklist`: "load only when about to atomic-close" — explicit gate.
Per `docs/architecture/DETERMINISM.md`'s math table: a 5-step procedural skill at p=0.95 per-step adherence = 77% workflow success. A 7-step skill = 70%. Both 292-LOC skills almost certainly contain >5 procedural steps that could be migrated to mechanisms 1-6 (server defaults, atomic composites, PreToolUse/PostToolUse/UserPromptSubmit hooks, requireRoles).
## Plan
Apply the **DETERMINISM doctrine** (`docs/architecture/DETERMINISM.md`) to every skill. Mechanism 7 (skill prose) is the fallback, not the default — every line that maps to mechanisms 1–6 migrates out.
### 1. Boundary-test audit of each `tmb_*/SKILL.md` (✓ approved)
For each procedural sentence in each skill, apply the 3-part boundary test:
- Is the verb a *judgment* on novel input (classify / decide / draft / weigh / synthesize)? → KEEP in skill prose.
- Is it a *fact about the world* (file exists, role has access, value is X) or a *sequence* that the LLM might drop? → MIGRATE to mechanisms 1–6.
- Cite the migration target in the audit doc: "step X → server-default (mech 1)", "step Y → composite (mech 2)", etc.
Output: `docs/architecture/skill-boundary-audit-v0.7.0.md` — per-skill table of sentences with classification + migration target.
### 2. Skill-frontmatter `loading-cost` annotation (research before adopting)
**Defer pending industry-standard research.** Before adding any metadata field to SKILL.md frontmatter:
- Survey: Anthropic prompt-engineering docs, OpenAI Cookbook, AutoGen / LangGraph / CrewAI conventions for skill/prompt metadata.
- Criteria: only add if (a) it's industry-standard AND (b) it does not change agent behavior (purely informational for tooling).
- If research surfaces a different convention (e.g. `stability_tier`, `eviction_priority`), adopt that instead.
- If no industry convention exists, do NOT invent one — use `docs/architecture/skill-boundary-audit-v0.7.0.md` as the out-of-band cost reference.
### 3. Split `tmb_planning` (292 → <200) and `tmb_review` (274 → <200) per Efficiency-of-JUDGEMENT
Per the user's `Efficiency of JUDGEMENT` doctrine — classify each JUDGEMENT block by per-run usage rate:
| Tier | Per-run usage | Goes into | Example from tmb_planning |
|---|---|---|---|
| **Always-on** | ~100% — every spawn needs it | Agent prompt body (CLAUDE.md / bro persona) | Cold-start judgment (lazy-fill vs deep-scan), spec-authoring framing |
| **On-demand** | <100% — depends on context | Skill body (loaded by description match) | ADR-required branching, branch-id confirmation flow, retry-rationale composition |
| **Migrate-out** | N/A — DETERMINISM not JUDGEMENT | Mechanism 1–6 | "Then call audit_log after task_create" → atomic composite (mech 2) |
Concrete split sketch (verify-then-execute):
**tmb_planning** (292 LOC) decomposes into:
- ~30 LOC always-on judgment → CLAUDE.md routing addendum
- ~120 LOC on-demand → `tmb_planning_spec` (spec authoring + scope-gate decisions)
- ~80 LOC on-demand → `tmb_planning_verify` (V1/V2/V3 verification judgment)
- ~60 LOC migrate-out → composite tools + hooks (e.g. `planning_complete` audit emission)
**tmb_review** (274 LOC) decomposes into:
- ~20 LOC always-on → pr-reviewer agent body (already there; review what's redundant)
- ~150 LOC on-demand → `tmb_review_score` (correctness/design/pattern phase reasoning)
- ~70 LOC on-demand → `tmb_review_triage` (PR/MR comment triage — only loaded by /monitor or push-block)
- ~30 LOC migrate-out → composite (`pr_review_runs` row writes can be one composite call)
### 4. Update placeholder file descriptions to current state (replaces previous "audit CLAUDE.md + parallel files")
Per Human review: only `CLAUDE.md` is loaded by CC. `CODEX.md` / `CURSOR.md` / `GEMINI.md` / `gemini-extension.json` are placeholders. Action:
- Verify each placeholder file's stated status matches current v0.7.0 reality
- Update `_status` / version / "what goes here when ready" sections so they don't lie about what TMB ships today
- No load-surface audit needed — placeholders aren't loaded
## Acceptance criteria
- **Both** `tmb_planning` and `tmb_review` reach **<200 LOC** in their primary SKILL.md (split children acceptable).
- `docs/architecture/skill-boundary-audit-v0.7.0.md` exists, listing every procedural sentence with classification (JUDGMENT-keep | DETERMINISM-migrate) + migration target.
- At least 3 procedural sequences from each over-200 skill migrate to mechanisms 1–6 (server defaults, composites, hooks, or requireRoles).
- Industry research for skill metadata documented in `docs/architecture/skill-metadata-research.md`; either adopt an industry convention OR document why none applies.
- `CODEX.md`, `CURSOR.md`, `GEMINI.md`, `gemini-extension.json` placeholder descriptions match current v0.7.0 plugin state.
- L4 + L5 still pass (skill discovery still triggers correctly).
- Measurement (gated by #2921): tokens-per-bro-turn for status-only turn drops by ≥25%; tokens-per-bro-turn for planning turn drops by ≥15% (smaller because planning intrinsically needs more context).
## Out of scope
- Restructuring `CLAUDE.md` itself (separate issue if needed).
- Custom skill-author lint for description specificity (could be a follow-up — depends on research outcome of item 2).
- Splitting other skills (focus only on >200 LOC; others are within budget).
## Coordination
- Depends on #2921 (measurement harness) to validate the 25% / 15% acceptance criteria.
- Pairs with #2919 (P1 hallucination) — skill boundary-test will surface "verify file_registry before claiming" as a load-bearing migration.
- Pairs with #2914 (composites extension) — many of the "migrate-out" steps will land as new composites.
## Note on source
The original description claimed `tmb_default_repo` and `tmb_scan` as skills, listed 6 skills, gave wrong line counts. Confabulations from prior-session impression. Corrected via direct read of `plugin/skills/*/SKILL.md` and `plugin/agents/`. The plan was then revised against Human review (2026-05-17): adopted 200 LOC limit + DETERMINISM doctrine; deferred frontmatter-annotation pending industry-standard research; detailed the split via Efficiency-of-JUDGEMENT; dropped parallel-platform-file audit (placeholders only); added placeholder-description-refresh task.
issue