RETE working-memory hot-path tweaks (cluster v0.5.421 cProfile follow-up)

v0.5.421 cluster cProfile (combined_40_ings, 22.1s recipe wall) showed
two new top hot spots after the prior optimization rounds removed
the obscuring costs:

- working_memory.get_parents: 196K calls / 1.03s tottime (#4 hot spot)
- get_cached_walk: 21K calls/recipe with redundant per-call gen check

Two fixes in this release:

  All callers iterate or use .update()/.extend(); none mutate the
  returned list. The list(parents) copy on every call was the
  dominant cost. Return the underlying set by reference (read-only
  contract documented). Empty case shares a frozenset.

  bump_cache_generation() always calls _walk_cache.clear(), so
  surviving entries are valid by construction. The per-call
  generation comparison in get_cached_walk / cache_walk was dead
  defensive code. Removed; contract test pins the invariant.
  Local: -2.4% to -3.3% wall on two_origins.

Tests: 514/514 green (orchestrator + benchmark).
Skipped: flow-node-flag-bitmask experiment (perf/flow-node-flag-bitmask
branch, parked) — agent measured -0.0% wall, not worth the indirection.

Expected cluster gain: 1-2s on 22.1s combined_40_ings baseline. The
get_parents win in particular is well-targeted at v0.5.421's #4 hot
spot.

After this tag, the next round of optimization needs to attack
working memory storage redesign (5.85M dict.get calls / 1.26s).
That's a multi-day project, not a one-shot agent task.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>