v0.7.0 — token-burn: Paginate DB-read responses (issue_get_with_discussions, audit_log_list, file_registry_list) (#2905) · Issues · Trust My Bot / plugin

v0.7.0 — token-burn: Paginate DB-read responses (issue_get_with_discussions, audit_log_list, file_registry_list)

## Problem (rewritten per Human review 2026-05-18) The original plan added a hardcoded `limit: 10` default to DB-read tools. Human review pushed back: **pagination is a band-aid, not a fix.** The DB exists to (a) save tokens vs grepping the whole codebase, (b) provide anti-hallucination over MEMORY.md. If queries cost more than `grep` or are less reliable than memory, the DB has no reason to exist. The real fix is **search-first architecture**: bro stops dumping rows; bro asks the DB for the top-K most relevant snippets. Two retrieval modes are needed: 1. **Keyword / exact / phrase** — when bro knows the term (e.g., "what audit events did `task_create_batch` emit yesterday?") → FTS5 + BM25 ranking. 2. **Semantic / fuzzy** — when bro doesn't know the exact phrasing (e.g., "what did we decide about authentication?" against a body that says "login flow security") → lightweight RAG via embedding-based vector search. ## Stack decisions (locked, per Human review) | Layer | Choice | Why | |---|---|---| | Keyword search | **FTS5 (built into `node:sqlite` 3.51.3)** | Zero install; Node is CC's prerequisite; BM25 + phrase + prefix + snippets work today | | Embedding runtime | **`@huggingface/transformers` with `onnxruntime-node`** | Prebuilt native binaries (darwin x64/arm64, linux x64/arm64, win x64); 5–10× faster than WASM-only; ~50ms warm query | | Embedding model | **bge-small-en-v1.5** (33 MB, 384-dim) | +5–8% English quality over MiniLM-L6 at same speed bucket; lazy-downloaded on first run | | Vector storage | **`embedding BLOB` (Float32 packed) in dedicated per-content tables** | No sqlite-vec native binary (avoids second cross-platform binary); brute-force cosine in JS over ~1k vectors is 5–10ms | | Vector search | **Brute-force JS cosine** | Trivial code; at TMB scale (dozens-to-thousands of rows) fast enough; can add sqlite-vec later if scale grows past 10K rows | | Hybrid ranking | **Reciprocal-Rank-Fusion (RRF)** over FTS5 + cosine + recency-decay weight | Standard hybrid; deterministic; tunable α | ## Schema strategy Per-content embedding tables (NOT a column on the source table): ```sql CREATE TABLE discussions_embeddings ( discussion_id INTEGER PRIMARY KEY REFERENCES discussions(id) ON DELETE CASCADE, embedding BLOB NOT NULL, -- 384 × Float32 = 1536 bytes model_id TEXT NOT NULL, -- 'bge-small-en-v1.5' embedded_at TEXT NOT NULL ); -- Same shape for audit_embeddings, file_registry_embeddings ``` Why separate tables (Option A vs adding a BLOB column to each content table): - Clean separation of embedding lifecycle from content lifecycle - Easy to drop+rebuild per-table when changing model - No NULL columns when embeddings missing / not-yet-generated - ON DELETE CASCADE keeps consistency automatically - Migration v3 adds 3 new tables; doesn't touch existing schemas' shapes ## New MCP tools (replace/supplement existing dump-style tools) | New tool | Replaces | Default response | |---|---|---| | `discussion_search(query, k=5, last_n=20, recency_alpha=0.3)` | `issue_get_with_discussions` (which currently joins ALL) | Top-K snippets with relevance + recency score; `~500 tokens` per response | | `audit_search(query, event_types?, k=5, ...)` | `audit_log_list` (which has no limit) | Same shape | | `file_registry_search(query, kind?, path_prefix?, k=10)` | `file_registry_list` (925 rows = 36 KB tokens today) | Same shape; path_prefix is exact-match shortcut | Existing `*_list` / `*_get_with_discussions` tools STAY for explicit "I need the full set" cases but gain `limit` + `cursor` for pagination — search is the default path bro uses. ## Existential validation (per Human's constraint) 1. **Does search cost less than grep?** Yes: 1 search call = ~500 tokens for top-5 snippets. `grep -r` over a comparably-sized repo + reading hits = thousands of tokens for the read alone. Verified against TMB scale (143 KB file_registry, 25 KB discussions). 2. **Is DB more reliable than MEMORY.md?** Yes: SQL queries return canonical current state. MEMORY.md drifts. Embeddings get auto-recomputed on write via triggers. ## Execution shape — TWO sequential SWE tasks ### SWE 1 — FTS5 infrastructure + search MCP tools (no embeddings yet) Branch: `feat/search-fts5-infrastructure`. Schema migration v3 adds FTS5 virtual tables + triggers for auto-sync. New MCP tools (`discussion_search`, `audit_search`, `file_registry_search`) ship with FTS5-only ranking (BM25 + recency-decay). Existing `*_list` tools gain `limit` + `cursor` for the "full data" path. **Acceptance**: search returns ≤1 KB top-5 snippets for representative queries; FTS5 keyword ranking works; existing tools still work with new optional pagination; L0+L1+L2+L3+L4 all green. ### SWE 2 — ONNX embedding pipeline + hybrid ranking + skill updates Branch: `feat/search-rag-onnx-embeddings`, off SWE 1's branch (stacked). Adds `@huggingface/transformers` + bge-small-en-v1.5; per-content `*_embeddings` tables; sync triggers on write (server generates embedding inline); brute-force cosine search; RRF hybrid ranking with FTS5. Updates `tmb_planning` / `tmb_review` / `tmb_recovery` skills to use `*_search` tools as the default. Backfills existing rows on migration. **Acceptance**: semantic queries (e.g. "what did we decide about authentication" against bodies that say "login flow security") return relevant results in top-3; query latency p95 < 200ms on warm cache; backfill of current DB (25 discussions + 67 audit + 925 file_registry) completes in <30 seconds. ## Acceptance criteria (overall) - All 3 new search tools (`discussion_search`, `audit_search`, `file_registry_search`) ship and pass L2+L3 tests - Embedding pipeline runs on every write (no manual backfill required for new data) - v2→v3 migration includes embedding backfill for existing rows - Bro's reads-per-turn token cost drops by ≥80% on representative sessions (gated by #2921 measurement harness if available, else subjective benchmark) - No regression in any existing test suite layer - Lightweight: native deps stay at one (`onnxruntime-node`); no sqlite-vec; no vector DB ## Out of scope - Switching off `node:sqlite` to `better-sqlite3` or another driver - Adding sqlite-vec (defer unless brute-force cosine becomes a hot spot at >10K rows) - Re-embedding on model change (handled by future model-bump issue) - Removing the existing `*_list` tools entirely (kept with pagination for explicit-full-dump needs) - Server-side query embedding cache (could be a follow-up if bro re-searches same query within a session) ## Coordination - Pairs with #2916 (compact responses) — search IS the compact-response mechanism for read-heavy tools - Unblocks downstream parts of #2904 (slim-skills) — skills that currently say "ask the DB for X" can switch to `*_search` and get the right shape automatically - The measurement harness from #2921 (deferred) would quantify the ≥80% token-burn reduction; without it we use a subjective benchmark - Resolves the Human's existential challenge ("if DB costs more than grep or is less reliable than memory.md, no point") — search makes both metrics win ## Note on source Original plan was hardcoded `limit: 10` pagination. Pivoted per Human review (!2905 note 2026-05-18): pagination treats the symptom; search treats the cause. The DB exists to save tokens and prevent hallucination; querying must be cheap and authoritative. Stack chosen for "lightweight RAG with no user dep issues": FTS5 (zero deps) + ONNX runtime (one prebuilt native binary covering common platforms) + bge-small-en-v1.5 (33MB lazy-downloaded model) + brute-force cosine over Float32 BLOBs (no second native binary).

issue