v0.7.0 — token-burn: Honor-system sqlite3 fallback — replace with a single bro-side recovery wrapper
Problem
When the trajectory-server MCP becomes unresponsive, bro and pr-reviewer fall back to direct sqlite3 Bash calls per the tmb_recovery skill's "trajectory-server unreachable" section. This is correct safety design, but the recovery flow involves multi-tier retries and each tier's stdout/output lands in bro's context.
Evidence (verified against live source)
skills/tmb_recovery/SKILL.md(127 LOC) §C "trajectory-server unreachable" — exists, documents the degraded-readonly fallback path viascripts/bro-sqlite-readonly.sh(referenced).tmb_recoveryallowed-toolsfrontmatter:Bash(skills/tmb_recovery/scripts/bro-sqlite-readonly.sh:*), mcp__plugin_tmb_trajectory-server__audit_log, mcp__plugin_tmb_trajectory-server__discussion_append— narrowly scoped to invoking the helper script (good) plus minimal MCP writes.scripts/hooks/mcp-health-check.shregistered in both SessionStart and UserPromptSubmit hook arrays — emits status that bro consumes.- Multi-tier retries: today bro often tries the MCP call, fails, reads
feedback_mcp_recovery.md(user memory) to decide tier, retries — each step's output enters context.
Plan
The partial wrapper already exists (bro-sqlite-readonly.sh). The work is to make recovery transparent and reduce context spend on tier-decisions.
- Promote the helper to a tool-call shape — wrap the sqlite3 fallback as an MCP-like helper that bro invokes uniformly:
- For read tools:
mcp_safe_read(tool, args)tries MCP once, falls back tobro-sqlite-readonly.shon failure, returns a unified JSON result - For write tools: explicit
mcp_safe_write(tool, args)— tries MCP, HALTs on failure with a clear escalation prompt (no honor-system writes per !2900 Cause E lesson)
- For read tools:
- Update
tmb_recoveryskill to point at the helper directly, replacing prose tier-instructions with a 5-line invocation pattern. - Audit event
mcp_recovery_path_takenwithcontent_json: {tier, latency_ms, tool}— quantifies recovery cost over a release cycle. - Reduce per-block context —
mcp-health-check.shshould emit a single 1-line status, not multi-line diagnostic. Diagnostic goes to a separate file readable on-demand.
Acceptance criteria
- New L3 test: simulate MCP unavailability; verify a read-tool call returns successfully via fallback in ≤1 Bash + ≤500 tokens of context.
- Write-tool fallback explicitly returns "MCP unavailable; HALT" without attempting sqlite3 (preserves !2900 safety).
feedback_mcp_recovery.md(user memory) updated to point at the new helper rather than describing tiers manually.tmb_recoveryskill body shrinks by ≥20 LOC (replaces prose with helper invocation).
Out of scope
- Replacing the MCP transport itself.
- Auto-restart of MCP server.
Coordination
- Pairs with #2917 (block-recovery cost) — both reduce context cost of failure paths.
Note on source
Previous description claimed feedback_mcp_recovery.md "describes 3 recovery tiers" without verifying. Verified tmb_recovery skill exists with the helper already partially in place. Adjusted plan from "build the wrapper" to "promote the existing helper to a transparent shape."
Edited by Zax Shen