v0.7.0 — token-burn: Slim bro's always-on skill set (description-match scoping) (#2904) · Issues · Trust My Bot / plugin

v0.7.0 — token-burn: Slim bro's always-on skill set (description-match scoping)

## Problem Bro auto-loads multiple TMB skills on every Human turn via CC description-matching. Verified count: **8 skills in `plugin/skills/`**, totaling 1054 lines of SKILL.md content: | Skill | Lines | Loads when (per description) | |---|---|---| | **tmb_planning** | **292** ⚠️ | First code-touching ask of a session | | **tmb_review** | **274** ⚠️ | Push gate, PR-comment triage, review-before-push asks | | tmb_agent-creator | 128 | Named-role consult ("get architect's read on X") | | tmb_recovery | 127 | First failure of a session | | tmb_skill-creator | 104 | "Create a skill that codifies <X>" | | tmb_docs-conventions | 52 | Editing prompt files / docs-update expectation | | tmb_concerns-protocol | 51 | When bro disagrees with the Human's plan | | tmb_swe-checklist | 26 | SWE about to atomic-close | Bro is **not** a file-defined agent — there is no `agents/bro.md`. Bro is the main CC session itself, shaped by `CLAUDE.md` (51 LOC) + skills above (loaded by CC description-matching). **Two skills exceed the 200 LOC industry-standard ceiling for a single prompt unit** (per `docs/architecture/DETERMINISM.md` authoring checklist + general prompt-engineering practice). They are also the broadest-firing — `tmb_planning` fires on any code-touching ask; `tmb_review` fires on 4 distinct triggers. ## Evidence (verified by reading `plugin/skills/*/SKILL.md`) - `skills/tmb_planning/SKILL.md` description ends "Self-contained — everything bro needs is here" — designed to fire on any code-touching ask, but its 292-line body lands in context unconditionally on match. - `skills/tmb_review/SKILL.md` description fires for **four** distinct triggers: pr-reviewer scoring (subagent), bro on push-hook block, bro on "review-before-push" asks, bro on PR-comment monitoring. 274 lines per match. - Well-scoped narrow descriptions (use as templates): - `tmb_concerns-protocol`: "Loaded when bro genuinely disagrees with a request" — narrow + verb-specific. - `tmb_swe-checklist`: "load only when about to atomic-close" — explicit gate. Per `docs/architecture/DETERMINISM.md`'s math table: a 5-step procedural skill at p=0.95 per-step adherence = 77% workflow success. A 7-step skill = 70%. Both 292-LOC skills almost certainly contain >5 procedural steps that could be migrated to mechanisms 1-6 (server defaults, atomic composites, PreToolUse/PostToolUse/UserPromptSubmit hooks, requireRoles). ## Plan Apply the **DETERMINISM doctrine** (`docs/architecture/DETERMINISM.md`) to every skill. Mechanism 7 (skill prose) is the fallback, not the default — every line that maps to mechanisms 1–6 migrates out. ### 1. Boundary-test audit of each `tmb_*/SKILL.md` (✓ approved) For each procedural sentence in each skill, apply the 3-part boundary test: - Is the verb a *judgment* on novel input (classify / decide / draft / weigh / synthesize)? → KEEP in skill prose. - Is it a *fact about the world* (file exists, role has access, value is X) or a *sequence* that the LLM might drop? → MIGRATE to mechanisms 1–6. - Cite the migration target in the audit doc: "step X → server-default (mech 1)", "step Y → composite (mech 2)", etc. Output: `docs/architecture/skill-boundary-audit-v0.7.0.md` — per-skill table of sentences with classification + migration target. ### 2. Skill-frontmatter `loading-cost` annotation (research before adopting) **Defer pending industry-standard research.** Before adding any metadata field to SKILL.md frontmatter: - Survey: Anthropic prompt-engineering docs, OpenAI Cookbook, AutoGen / LangGraph / CrewAI conventions for skill/prompt metadata. - Criteria: only add if (a) it's industry-standard AND (b) it does not change agent behavior (purely informational for tooling). - If research surfaces a different convention (e.g. `stability_tier`, `eviction_priority`), adopt that instead. - If no industry convention exists, do NOT invent one — use `docs/architecture/skill-boundary-audit-v0.7.0.md` as the out-of-band cost reference. ### 3. Split `tmb_planning` (292 → <200) and `tmb_review` (274 → <200) per Efficiency-of-JUDGEMENT Per the user's `Efficiency of JUDGEMENT` doctrine — classify each JUDGEMENT block by per-run usage rate: | Tier | Per-run usage | Goes into | Example from tmb_planning | |---|---|---|---| | **Always-on** | ~100% — every spawn needs it | Agent prompt body (CLAUDE.md / bro persona) | Cold-start judgment (lazy-fill vs deep-scan), spec-authoring framing | | **On-demand** | <100% — depends on context | Skill body (loaded by description match) | ADR-required branching, branch-id confirmation flow, retry-rationale composition | | **Migrate-out** | N/A — DETERMINISM not JUDGEMENT | Mechanism 1–6 | "Then call audit_log after task_create" → atomic composite (mech 2) | Concrete split sketch (verify-then-execute): **tmb_planning** (292 LOC) decomposes into: - ~30 LOC always-on judgment → CLAUDE.md routing addendum - ~120 LOC on-demand → `tmb_planning_spec` (spec authoring + scope-gate decisions) - ~80 LOC on-demand → `tmb_planning_verify` (V1/V2/V3 verification judgment) - ~60 LOC migrate-out → composite tools + hooks (e.g. `planning_complete` audit emission) **tmb_review** (274 LOC) decomposes into: - ~20 LOC always-on → pr-reviewer agent body (already there; review what's redundant) - ~150 LOC on-demand → `tmb_review_score` (correctness/design/pattern phase reasoning) - ~70 LOC on-demand → `tmb_review_triage` (PR/MR comment triage — only loaded by /monitor or push-block) - ~30 LOC migrate-out → composite (`pr_review_runs` row writes can be one composite call) ### 4. Update placeholder file descriptions to current state (replaces previous "audit CLAUDE.md + parallel files") Per Human review: only `CLAUDE.md` is loaded by CC. `CODEX.md` / `CURSOR.md` / `GEMINI.md` / `gemini-extension.json` are placeholders. Action: - Verify each placeholder file's stated status matches current v0.7.0 reality - Update `_status` / version / "what goes here when ready" sections so they don't lie about what TMB ships today - No load-surface audit needed — placeholders aren't loaded ## Acceptance criteria - **Both** `tmb_planning` and `tmb_review` reach **<200 LOC** in their primary SKILL.md (split children acceptable). - `docs/architecture/skill-boundary-audit-v0.7.0.md` exists, listing every procedural sentence with classification (JUDGMENT-keep | DETERMINISM-migrate) + migration target. - At least 3 procedural sequences from each over-200 skill migrate to mechanisms 1–6 (server defaults, composites, hooks, or requireRoles). - Industry research for skill metadata documented in `docs/architecture/skill-metadata-research.md`; either adopt an industry convention OR document why none applies. - `CODEX.md`, `CURSOR.md`, `GEMINI.md`, `gemini-extension.json` placeholder descriptions match current v0.7.0 plugin state. - L4 + L5 still pass (skill discovery still triggers correctly). - Measurement (gated by #2921): tokens-per-bro-turn for status-only turn drops by ≥25%; tokens-per-bro-turn for planning turn drops by ≥15% (smaller because planning intrinsically needs more context). ## Out of scope - Restructuring `CLAUDE.md` itself (separate issue if needed). - Custom skill-author lint for description specificity (could be a follow-up — depends on research outcome of item 2). - Splitting other skills (focus only on >200 LOC; others are within budget). ## Coordination - Depends on #2921 (measurement harness) to validate the 25% / 15% acceptance criteria. - Pairs with #2919 (P1 hallucination) — skill boundary-test will surface "verify file_registry before claiming" as a load-bearing migration. - Pairs with #2914 (composites extension) — many of the "migrate-out" steps will land as new composites. ## Note on source The original description claimed `tmb_default_repo` and `tmb_scan` as skills, listed 6 skills, gave wrong line counts. Confabulations from prior-session impression. Corrected via direct read of `plugin/skills/*/SKILL.md` and `plugin/agents/`. The plan was then revised against Human review (2026-05-17): adopted 200 LOC limit + DETERMINISM doctrine; deferred frontmatter-annotation pending industry-standard research; detailed the split via Efficiency-of-JUDGEMENT; dropped parallel-platform-file audit (placeholders only); added placeholder-description-refresh task.

issue