perf(orchestrator): shallow clone in AddNodeMutation (-6.3% wall)

AddNodeMutation.apply() previously used self.new_node.model_copy(deep=True)
to materialize a node into the calc graph. cProfile on stress_scaling@2000ingr
showed copy.deepcopy at 414k calls / 0.68s self-time = 4.2% of wall, plus
a long tail of internal deepcopy work pushing total deepcopy-related cost
to ~6% of wall.

Replaced with a _shallow_clone_node helper. Safe because:
1. Pydantic Props are immutable by contract — PropMutation REPLACES slots
   via super().__setattr__, never mutates in-place. Inner Prop data
   (e.g. gfm_state.worker_states dict) can be shared.
2. New node gets fresh __dict__ so private slots like _calculation /
   _parent_nodes don't leak back to the caller.
3. Each Prop slot gets a fresh Prop instance so set_owner_node_for_props
   doesn't rebind the source's Props.
4. Iterates __dict__ (not model_fields_set) — required because
   inventory_importer/bw_importer trims model_fields_set on cached
   ElementaryResourceEmissionNodes; iterating model_fields_set would
   silently drop uid and break downstream add_edge.

Measurements (stress_scaling@2000ingr, 3 runs):
- copy.deepcopy ncalls: 414,310 → 12,025  (-97%)
- copy.deepcopy tottime: 0.676s → 0.147s  (-78%)
- Total wall time: 14.218s → 13.32s mean  (-6.3%)

Investigation findings from parallel attempts (not committed):
- Path D (asyncio yield frequency): the kqueue 17.4% figure was stale;
  current HEAD already runs kqueue at 1.5-2%. No further win available.
- Path F (cache is_node_in_affected_subtree): function short-circuits
  in healthy workloads (0.029% wall, not 1.4%). Not worth caching.

Validation:
- test_benchmark_two_origins CO2=1.2568 invariant holds
- test_calculation_with_subrecipe CO2=0.0930 invariant holds
- 96/96 in broad gauntlet
- 41/44 legacy_recipe_router (3 pre-existing batch flakes only)