explicit-visual-acceptance-gate
<!-- SPDX-License-Identifier: CC-BY-SA-4.0 -->
## Learning Record
**Project codename:** blue-marlin
**Date:** 2026-04-26
**Category:** gap
**Severity:** significant
**Framework version:** 0.34.0
### What happened
After a previous iteration added two output-level pytest invariants
following a related learning (see C1 / "output-level invariants"),
the project team thought the geometric output was now adequately
covered. The user (Governor-Reviewer) opened the drawings AGAIN and
reported two more pre-existing structural defects:
1. The X-cross brace appeared flat (1.78 deg angle, essentially
horizontal in side view).
2. The longitudinal joists were hanging in the air with no support
member underneath them — an entire structural component class
(transverse cap beams) was missing from the model.
Both defects pre-existed in the original source code adopted by the
project; neither was a regression. The follow-up iteration fixed both
by moving the cross brace attachment points to waling centerlines
(yielding an 8 deg visible angle) and adding a new component class —
8 transverse cap beams, one per pile pair, on which the longitudinal
joists rest.
The framework-relevant observation: output-level pytest invariants
catch only what someone thought to assert. The previously added tests
covered "cross brace fits below deck" but did not assert "cross brace
has a visible angle" or "joist rests on something". The advisory
pattern (timber + structural advisor) caught planning-time concerns
but did NOT catch either of these. Visual user review remained the
irreplaceable last mile.
This is corroborating evidence for a recurring pattern across
multiple iterations of the same project: automated checks pass while
the human reviewer sees obvious defects in the produced artefacts.
### Agents involved
coder-tester, structural-advisor (advisory, informal),
governor-reviewer
### Framework section
`framework.md` Pattern: Two Cadences of Cross-Cutting Review
`rules-template/workflow/tdd-rule.md`
### Business impact
`prevented_loss` — caught a missing structural component
(transverse cap beams) that would have made the pier physically
non-functional on first construction. The unsupported 21 m joists
would have failed at first use under any pedestrian load.
### Recommendation
Add an explicit checklist item to Gate 2 for projects that produce
visual or structural output (drawings, renderings, UI screenshots,
charts, generated documents):
> **"Did a human open the output and visually inspect it this
> iteration?"**
Pytest assertions and code review are necessary but **not sufficient**
substitutes. In this project, three iterations of automated checks
did not catch what the user saw in 30 seconds of looking at the
rendering.
Suggested implementation paths in framework:
1. Add a per-change gate item under "Cross-cutting checklists" in
`framework.md` Section 4 (Two Cadences of Cross-Cutting Review):
"If this change touches code that produces visual or structural
output: a human must open the output and confirm it looks correct."
2. Or add as a new line item in `rules-template/workflow/tdd-rule.md`.
3. Mention in `ADOPTING.md` Step 4 (first task retrospective):
"If your task produces visual output, did you actually look at it,
not just check that the file was created?"
The blue-marlin project addressed this in a later iteration by adding
a `vie_png()` function that renders DXF drawings to PNG via ezdxf's
matplotlib backend, enabling an LLM (multimodal) to view and verify
drawings programmatically — closing the visual-inspection gap with
no human-loop overhead. This may be a useful pattern to mention in
the framework: programmatic visual verification via image-capable
LLMs as a complement to (not replacement for) human inspection.
---
**Anonymization checklist:**
- [x] No real project, product, or organisation names
- [x] No internal URLs, hostnames, or infrastructure identifiers
- [x] No customer or partner names
issue