explicit-visual-acceptance-gate (#40) · Issues · Jani Päijänen / agent-governance

explicit-visual-acceptance-gate

## Learning Record **Project codename:** blue-marlin **Date:** 2026-04-26 **Category:** gap **Severity:** significant **Framework version:** 0.34.0 ### What happened After a previous iteration added two output-level pytest invariants following a related learning (see C1 / "output-level invariants"), the project team thought the geometric output was now adequately covered. The user (Governor-Reviewer) opened the drawings AGAIN and reported two more pre-existing structural defects: 1. The X-cross brace appeared flat (1.78 deg angle, essentially horizontal in side view). 2. The longitudinal joists were hanging in the air with no support member underneath them — an entire structural component class (transverse cap beams) was missing from the model. Both defects pre-existed in the original source code adopted by the project; neither was a regression. The follow-up iteration fixed both by moving the cross brace attachment points to waling centerlines (yielding an 8 deg visible angle) and adding a new component class — 8 transverse cap beams, one per pile pair, on which the longitudinal joists rest. The framework-relevant observation: output-level pytest invariants catch only what someone thought to assert. The previously added tests covered "cross brace fits below deck" but did not assert "cross brace has a visible angle" or "joist rests on something". The advisory pattern (timber + structural advisor) caught planning-time concerns but did NOT catch either of these. Visual user review remained the irreplaceable last mile. This is corroborating evidence for a recurring pattern across multiple iterations of the same project: automated checks pass while the human reviewer sees obvious defects in the produced artefacts. ### Agents involved coder-tester, structural-advisor (advisory, informal), governor-reviewer ### Framework section `framework.md` Pattern: Two Cadences of Cross-Cutting Review `rules-template/workflow/tdd-rule.md` ### Business impact `prevented_loss` — caught a missing structural component (transverse cap beams) that would have made the pier physically non-functional on first construction. The unsupported 21 m joists would have failed at first use under any pedestrian load. ### Recommendation Add an explicit checklist item to Gate 2 for projects that produce visual or structural output (drawings, renderings, UI screenshots, charts, generated documents): > **"Did a human open the output and visually inspect it this > iteration?"** Pytest assertions and code review are necessary but **not sufficient** substitutes. In this project, three iterations of automated checks did not catch what the user saw in 30 seconds of looking at the rendering. Suggested implementation paths in framework: 1. Add a per-change gate item under "Cross-cutting checklists" in `framework.md` Section 4 (Two Cadences of Cross-Cutting Review): "If this change touches code that produces visual or structural output: a human must open the output and confirm it looks correct." 2. Or add as a new line item in `rules-template/workflow/tdd-rule.md`. 3. Mention in `ADOPTING.md` Step 4 (first task retrospective): "If your task produces visual output, did you actually look at it, not just check that the file was created?" The blue-marlin project addressed this in a later iteration by adding a `vie_png()` function that renders DXF drawings to PNG via ezdxf's matplotlib backend, enabling an LLM (multimodal) to view and verify drawings programmatically — closing the visual-inspection gap with no human-loop overhead. This may be a useful pattern to mention in the framework: programmatic visual verification via image-capable LLMs as a complement to (not replacement for) human inspection. --- **Anonymization checklist:** - [x] No real project, product, or organisation names - [x] No internal URLs, hostnames, or infrastructure identifiers - [x] No customer or partner names

issue