perf(db): batch required_matching node-prop writes (final per-flow DB roundtrip) The last un-batched per-flow DB roundtrip. MatchProductNameGFM.run() issued one UPDATE per unmatched flow via update_node_prop( required_matching, append=True) — ~7997 calls @2000ingr stress. Now buffered in NodeService keyed by root_node_uid and flushed in one bulk UPDATE at scheduler quiescence via the update_node_prop_bulk DB layer shipped in v0.5.432. In-memory PropListMutation stays immediate. This is the required_matching half reverted in v0.5.432. That revert was judged on an N=10 A/B of the already-flaky test_matching_and_cache_invalidation_complete_workflow. Re-verified independently at N=20: baseline 13/20 pass, this change 14/20 pass — statistically identical, the change does NOT worsen the flake. required_matching is safe to defer: consumed only by FUTURE calculations (graph reload) and the cleanup CLI, never within the same request; /apply and /update-automatching invalidate by explicit node_uid, not by reading required_matching. Validation: - two_origins CO2=1.2568, subrecipe CO2=0.0930 invariants hold - 120/120 broad gauntlet - 41/44 legacy_recipe_router (3 pre-existing batch flakes only) - N=20 flake A/B independently re-run: 13/20 baseline vs 14/20 change NOTE ON CLUSTER MEASUREMENT: the dagster cluster benchmark is too noisy for single-run per-tag comparison — two runs of the identical v0.5.432 build 5h apart differed 2-4x (combined_40: 5.34s vs 9.60s; develop baseline itself shifted 4.60s -> 3.64s). All perf claims in the v0.5.426-v0.5.433 series rest on deterministic local cProfile call-count reductions + stable correctness invariants, NOT on single cluster wall-time runs. A rigorous cluster A/B needs N>=10 interleaved runs per build. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>