Tags

Tags give the ability to mark specific points in history as being important
  • v0.5.423-rete-plus-develop

    protected
    RETE working memory redesign — typed property indexes (the hard risk)
    
    Cluster cProfile (combined_40_ings, v0.5.421) flagged dict.get on
    _facts as the #1 hot spot: 5.85M calls / 1.26s tottime. The cost
    was per-call 3-tuple allocation + hashing in the canonical
    `(uid, fact_type, key) -> WorkingMemoryFact` store, plus Fact
    wrapper unwrapping at every read.
    
    This release adds typed sub-indexes that mirror the property_value
    and property_type slices of _facts:
    
      _property_values: dict[node_uid, dict[name, value]]
      _property_types : dict[node_uid, dict[name, type_name]]
    
    Hot-path reads skip tuple construction and Fact wrapping:
    
      has_property(uid, name)        — 2x dict.get + `in`
      get_property_value(uid, name)  — 2x dict.get
      get_property_type(uid, name)   — 2x dict.get
    
    Maintained alongside _facts in assert_fact, retract_fact,
    retract_all_for_node, and clear. Contract test (18 cases) pins:
    - read-side semantics (incl. falsy-value preservation: 0/''/False)
    - mirror consistency across assert / retract / retract_all / clear
    - has_property MUST distinguish property_value from property_type
    - get_fact() compat for non-property fact_types
    
    This is the "hard risk" optimization step — touches the core
    storage contract of WM. Validated by 532/532 green tests including
    CO2 invariant on test_benchmark_two_origins.
    
    Local impact invisible (two_origins volume too low). Expected
    cluster impact 0.4-0.8s on the 22.4s combined_40_ings baseline.
    Cluster cProfile after deploy will tell us whether this also
    reduces the cumtime for upstream callers (alpha_network.evaluate
    3.46s, conditions.check 1.82s).
    
    Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
  • v0.5.422-rete-plus-develop

    protected
    RETE working-memory hot-path tweaks (cluster v0.5.421 cProfile follow-up)
    
    v0.5.421 cluster cProfile (combined_40_ings, 22.1s recipe wall) showed
    two new top hot spots after the prior optimization rounds removed
    the obscuring costs:
    
    - working_memory.get_parents: 196K calls / 1.03s tottime (#4 hot spot)
    - get_cached_walk: 21K calls/recipe with redundant per-call gen check
    
    Two fixes in this release:
    
      All callers iterate or use .update()/.extend(); none mutate the
      returned list. The list(parents) copy on every call was the
      dominant cost. Return the underlying set by reference (read-only
      contract documented). Empty case shares a frozenset.
    
      bump_cache_generation() always calls _walk_cache.clear(), so
      surviving entries are valid by construction. The per-call
      generation comparison in get_cached_walk / cache_walk was dead
      defensive code. Removed; contract test pins the invariant.
      Local: -2.4% to -3.3% wall on two_origins.
    
    Tests: 514/514 green (orchestrator + benchmark).
    Skipped: flow-node-flag-bitmask experiment (perf/flow-node-flag-bitmask
    branch, parked) — agent measured -0.0% wall, not worth the indirection.
    
    Expected cluster gain: 1-2s on 22.1s combined_40_ings baseline. The
    get_parents win in particular is well-targeted at v0.5.421's #4 hot
    spot.
    
    After this tag, the next round of optimization needs to attack
    working memory storage redesign (5.85M dict.get calls / 1.26s).
    That's a multi-day project, not a one-shot agent task.
    
    Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
  • v0.5.421-rete-plus-develop

    protected
    RETE second-wave optimizations (cluster-cProfile-driven)
    
    Builds on v0.5.420 baseline (combined_40_ings recipe wall 22.4s, dev cluster).
    This release ships 4 additional optimizations targeting v0.5.420's top
    remaining hot spots:
    
    H1: prop dataclass plan caching (83eaa9767)
      unvalidated_construct + __post_init__ iterate __dataclass_fields__.items()
      on every Prop construction. Hoist to per-class plan cached at first touch.
      Local: -8.3% median two_origins, -62% items() count.
    
    H2: pydantic-class isinstance bypass on RETE hot path (7153e5887)
      39.7% of isinstance calls on RETE hot path were pydantic Node/Condition
      checks (slow __instancecheck__ via _abc_instancecheck). Replace with
      type-name MRO frozenset cached per class. Class-level _or_filter_attr
      dispatch in alpha_network.py:1445 replaces 5-way isinstance branch.
      Local: -39.7% isinstance count on two_origins.
    
    H3: per-GFM alpha index for reset_gfm_activation (a01bfdee5)
      6284 calls/recipe on combined_40 was iterating all alphas with prefix
      string compare. Build _alphas_by_gfm: dict[str, list[AlphaNode]] once at
      register_gfm time. Reset reads precomputed list instead of scanning.
      Local: -67% function-level cumtime; cluster impact at 6284 × 50µs scale.
    
    H4: pydantic __getattr__ bypass in NodeAttribute(Condition|Alpha) (d32ceccc4)
      290K calls/recipe on combined_40 to pydantic main.__getattr__ at 1.58µs
      each (1.07s cumtime). NodeAttribute(Condition|Alpha) check `is_*` markers
      set via object.__setattr__; replace getattr with node.__dict__.get(name).
      Local: -42.3% __getattr__ count, -11.5% median wall on two_origins.
    
    Tests: 510/510 green (orchestrator + benchmark suites + 7 new contract tests
    for reset_gfm_activation). CO2 stability preserved (test_benchmark_two_origins
    still asserts 1.2568).
    
    Expected cluster gain (combined_40_ings): 3-5s off the 22.4s v0.5.420
    baseline. cProfile decoded from the next dagster run will confirm.
    
    Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
  • v0.5.420-rete-plus-develop

    protected
    RETE hot-path optimizations driven by cluster cProfile evidence
    
    Cluster cProfile on combined_40_ings (recipe wall 23.6s on dev,
    v0.5.419) showed orchestration at 82% of wall time, with these
    top hot spots:
    
      ncalls  tottime  function
      897K    1.23s    alpha_network.activate
      2.6M    0.74s    isinstance
      1.76M   0.79s    hasattr
      974K    0.60s    getattr
      403K    0.56s    alpha_network._get_related_at_depth
      1.39M   0.44s    _abc_instancecheck   (driven by isinstance)
    
    This release ships three changes targeting these:
    
    1. _kind/_is_compound flags (8c8923aca)
       Replace per-callback isinstance + set-membership with class-level
       bit flags set at network-build time. Eliminates 2.6M isinstance
       calls on combined_40 → 0 in selective_evaluator hot path.
    
    2. Per-batch BFS walk cache (53fd00f86)
       Cache RelatedNodeAlpha._get_related_at_depth results within a
       single on_facts_changed sweep. Measured 91.3% hit rate on
       two_origins fixture; on combined_40 the BFS cumtime is 2.4s →
       should drop to ~0.2s.
    
    3. Drop hasattr/getattr probes on alpha hot path (06b932bc7)
       Hoist condition fields into PerRelatedNodeAlpha.__init__ (frozen
       dataclass — sound). Drop dead hasattr guards for methods always
       defined on Node base class. Measured -29.9% hasattr, -21.4%
       getattr on two_origins; -2.6% wall time.
    
    Tests: 503/503 green (orchestrator + benchmark suites).
    Local two_origins: 3.05-3.20s (within run-to-run noise).
    Expected cluster gain: 3-5s off the 23.6s combined_40_ings baseline,
    biggest impact on the 18.6s outside-GFM time (selective_evaluator,
    relational fan-out, alpha activation).
    
    Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
  • v0.5.419-rete-plus-develop

    protected
    Performance + diagnostics for combined recipe regression
    
    Already-committed wins on this branch since v0.5.418:
    - per-node alpha activation in on_facts_changed (a5eaffef1)
    - skip asyncpg reset query on pool release (2a3fefff6)
    - process-lifetime cache for find_uid_by_xid + find_access_group (36348c871, ee87a8769)
    - inline cProfile + per-GFM CPU/wall + throttle + pool stats (f02fc1611)
    - gate cProfile on enable_cprofile_inline, off by default (be65d76b2)
    - cluster cProfile capture script with prod denylist + dev allowlist (73617a4be, 53d2fd2ef)
    - contract test pinning DFU per-product dedup on Origin-split FPFs (158518767)
    
    cProfile activation requires workflows MR 1054 + tag v0.1.474.
    
    Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
  • v0.5.418-rete-plus-develop

    protected
    Phase 5: gate cProfile on enable_cprofile_inline (defaults False — fixes overhead leak in dagster benchmarks)
  • v0.5.417-rete-plus-develop

    protected
    Phase 4: cache find_or_create_uid_by_xid (saves 5 more queries/recipe)
  • v0.5.416-rete-plus-develop

    protected
    Phase 3: cache xid→uid and node→access_group_uid (saves 10 queries/recipe)
  • v0.5.415-rete-plus-develop

    protected
    Phase 2: skip asyncpg reset query — 50% fewer DB roundtrips per recipe
  • v0.5.414-rete-plus-develop

    protected
    Phase 1: per-node alpha activation (5% local wall reduction at 8/16-ing scale)
  • v0.5.413-rete-plus-develop

    protected
    Add perf instrumentation: cProfile + per-GFM CPU/wall + throttle samples + pg pool stats inline
  • v0.5.412-rete-plus-develop

    protected
    Revert v0.5.411 IAE skip-list (slower in dagster benchmark)
  • v0.5.411-rete-plus-develop

    protected
    Skip IAE on structural FPAs (TMD/WS/NSD/Processing/Greenhouse) - fixes speed regression cascade
  • v0.5.410-rete-plus-develop

    protected
    Narrow Rainforest+Greenhouse skip to Origin/WS-duplicate FPFs (full core suite green)
  • v0.5.409-rete-plus-develop

    protected
    Idempotent Rainforest + narrow Greenhouse skip on duplicate FPFs
  • v0.5.408-rete-plus-develop

    protected
    Restore v0.5.404's working RETE gates