perf(rete): foundation + Site #4 UWC conversion

Foundation fixes that unblock cancel+refire on DuplicateNodeMutation
runtime nodes, plus UWC Site #4 (FlowNode sub_node wait) converted
from imperative reschedule to cancel+final_pass.

Three orchestrator-level fixes:
1. DuplicateNodeMutation strips transient gfm_state entries from
   duplicates for an allow-list of GFMs (currently {UWC}).
2. Quiescence fire-order: execute_final_pass_refires runs BEFORE
   _execute_final_gfms.
3. execute_final_pass_refires gained cancel/finished rehab via
   clear_gfm_state_entry + reactivate_gfm_alphas.

UWC Sites #1+#2 stay imperative — bidirectional FPA/FPF aggregation
cycle (with AddClientNodes-created LinkingActivityNode wrappers)
cannot be expressed as one-way refire triggers. Site #5 stays
imperative — fall-through-to-ready timeout has no equivalent in
cancel+refire.

Cluster wins from v0.5.426 (Origin/IAE conversions) preserved.

Validation:
- test_benchmark_two_origins CO2=1.2568.
- test_calculation_with_subrecipe CO2=0.0930.
- 87/87 in core orchestrator + benchmark + GFM + isolation tests.
- 41/44 legacy_recipe_router (3 pre-existing batch flakes only).

See docs/uwc-fpa-boundary-conversion-design.md.