v0.7.0 — token-burn: Main session over-accumulation — summary tool messages for sub-tool results

Problem

CC's main loop accumulates every Bash result, Read excerpt, Edit, MCP response, and subagent final message verbatim into bro's context. Bro does not get a summary of a tool result — bro gets the full payload. The trajectory DB encourages "ask the DB for context" but the DB returns it as uncompressed text into the active window.

Largest empirical offenders (verified 2026-05-17):

  1. file_registry_list — 925 summaries × 155 chars avg = 143 KB / ~36 K tokens for a full dump. Dominates other tool responses by 3–10×.
  2. issue_get_with_discussions on a hot issue — joins all discussion rows; can hit 20+ KB on issues with long histories
  3. *_report_mdissue_report_md + branch_report_md materialize all sections; observed 8–15 KB in v0.6.0 sessions
  4. audit_log_list — smaller (67 rows × 142 char avg content_json = ~9.5 KB) but still uncapped

Combined with no auto-compaction within the main session, context grows linearly until CC's hard limit forces compaction (which itself burns tokens).

Evidence (verified against live DB)

Repeat measurements from 2026-05-17:

  • file_registry: 927 rows, 925 with summaries, total summary chars = 143,460
  • audit: 67 rows, total content_json chars = ~9,500
  • discussions: 25 rows, total body chars = ~24,375

Plan

Two complementary modes — pagination (#2905 (closed)) controls how many rows return; compact-mode controls how dense each row is.

  1. compact: true default on all read-heavy tools. Compact mode returns:
    • For list tools: {id, status, key_field} projection — not full rows
    • For join tools: counts + last-N + totals — not full joins
    • For markdown reports: top-level metadata + last 5 events (~500 tokens)
  2. include_full: bool opt-in — explicit, never silent
  3. fields: string[] projection for JSON-array responses — bro often needs only id, status, objective, not the full row
  4. Server-side markdown report modes: summary | detail; default summary
  5. Document the "DB-as-context UI" tradeoff in docs/contributing/ARCHITECTURE.md so contributors understand the tax and design around it.

Acceptance criteria

  • L2 tests confirm default compact=true mode for all read-heavy tools returns ≤2 KB on the live populated DB.
  • New benchmark in tests/benchmarks/tokens-per-bro-turn.sh tracks average tokens-per-bro-turn over a representative session; baseline + post-fix recorded.
  • No bro skill code paths break (they receive compacted form by default; explicit include_full where the body is needed).

Out of scope

  • LLM-based summarization (separate concern; not free).
  • Streaming responses (MCP protocol concern).

Coordination

  • Pairs with #2905 (closed) (pagination) — together they bound both row-count and row-density.
  • Pairs with #2919 (P1 hallucination) — compact mode + enriched file_registry (kind, line_count) lets bro answer "is X a real skill?" cheaply.
  • Pairs with #2906 (caps) — capped fields are deterministic for compact-mode shaping.

Note on source

Previous description estimated file_registry at "50–100 KB tokens." Verified empirically: 143 KB chars ≈ 36 K tokens. Updated.

Edited by Zax Shen