v0.7.0 — token-burn: Cap server-side field sizes for content_json, discussion body, file_registry summary

Problem

Several DB columns are TEXT with no size cap in the schema or write-path validation:

  • audit.content_json
  • discussions.body
  • file_registry.summary
  • tasks.spec_body (has SPEC_BODY_MAX_BYTES constant in code, default cap exists, but configurable via env)

Without enforced caps, a single rogue write (paste of a giant log, accidentally-large JSON, runaway summary) would bloat the DB and amplify into every subsequent read that joins or returns the row.

This is a preventative issue, not a remedial one. Live DB on 2026-05-17 shows current values are well-bounded (see Evidence). But the lack of structural caps means a future bad actor (or rogue subagent) can poison the DB with arbitrarily large blobs.

Evidence (verified against live DB + source)

Current state — well-bounded:

  • audit.content_json: 67 rows, avg = 142 chars, max = 726 chars (nowhere near the 1 MB documented limit)
  • discussions.body: 25 rows, avg = 975 chars, max = 2,551 chars
  • file_registry.summary: 925 with summaries, avg = 155 chars

Schema state — no LENGTH CHECK present:

  • Verified mcp/trajectory-server/src/schema.sqlgrep -nE "LENGTH\(" schema.sql returns nothing
  • TARGET_SCHEMA_VERSION = 2 in db.ts:9; v1→v2 migration exists; v2→v3 needed for caps

Already-capped (in code, not schema):

  • tasks.spec_bodySPEC_BODY_MAX_BYTES constant in tools/tasks.ts, used by composites.ts for task_retry_batch
  • roundtable_create.retry_rationale — "≤200 chars" per schema description (advisory, not enforced server-side)

Plan

Targets are deliberately conservative — set caps generous enough that normal writes don't trip them, strict enough to prevent runaway bloat.

  1. Schema v3 — add CHECK(LENGTH(body) <= 4096) on discussions.body. summary <= 1024 on file_registry.summary. content_json <= 8192 on audit.content_json.
  2. Server-level validation — reject writes exceeding cap with {"error": "field_too_large", "field": "body", "current_size": 4521, "cap": 4096}. Wire through requireRoles middleware so the error path is uniform.
  3. Migration v3 backfill — for existing rows over cap (currently zero, but possible in deployed instances), truncate to cap with trailing […truncated N chars] marker and log to audit as field_truncated_in_migration with the original length.
  4. Linttests/lint/no-uncapped-text-columns.sh to prevent regressions: any new TEXT column without CHECK(LENGTH(...) <= ...) or explicit allowlist must fail lint.
  5. Document caps in mcp/trajectory-server/docs/SCHEMA.md (or equivalent) so contributors know the field-size contract.

Acceptance criteria

  • New L2 tests verify each capped column rejects oversize writes with structured error.
  • Migration v3 backfill test verifies pre-cap rows over cap are truncated correctly.
  • Live DB migration on this repo completes without truncating any current rows (all current rows are within proposed caps — sanity check).
  • New lint test verifies the rule catches uncapped new columns.

Out of scope

  • Compaction of normal-size historical rows.
  • Different caps per event_type for audit (could be a v0.7+ refinement).

Coordination

  • Pairs with #2919 (P1 hallucination) — both want v2→v3 schema migration; should ship as one migration.
  • Pairs with #2918 (cache discipline) — capped fields are cache-friendlier (predictable size).

Note on source

Previous description claimed "1 MB allowed per content_json" without verifying. Source check (tools/audit.ts registration) shows no enforced cap in schema or middleware — only advisory in the tool description. Verified live max sizes are tiny. Reframed as preventative architecture, not remediation.

Edited by Zax Shen