Suggestions from RETE parallel analysis for can_run_now refactor (improve_can_run_conditions_v5) (#14) · Issues · eaternity / eos

Suggestions from RETE parallel analysis for can_run_now refactor (improve_can_run_conditions_v5)

## Context @yannick — This issue summarizes findings from the RETE parallel execution branch (`feature/rete-parallel-gfm-execution`, MR !1659) that are directly relevant to your `improve_can_run_conditions_v5` work. Both branches solve the same fundamental problem — replacing ad-hoc GFM scheduling with declarative property dependencies — but from opposite directions: yours improves the polling model bottom-up, ours replaced it top-down with RETE. Here's what we learned that might help. ## 1. Missing writers in PROP_WRITERS Your registry has some incomplete writer lists. These cause subtle bugs where a GFM thinks a property is settled when another writer is still pending. | Property | Your writers | Missing writers | Impact | |----------|-------------|-----------------|--------| | `product_name` | MatchProductName | **AttachFoodTags**, **LinkTermToActivityNode** | AFT overwrites product_name (list→NamesProp); LTAN also writes it. If a downstream GFM checks `get_prop_status("product_name")` and only MPN is tracked, it'll return `settled` too early. | | `environmental_flows` | MatrixCalculation | **InventoryConnector** | IC writes environmental_flows on activity nodes first; MC overwrites on root. Missing IC means downstream GFMs may see IC's output as "settled" while MC is still pending. | | `production_amount` | UnitWeightConversion | **InventoryConnector** | IC also writes `production_amount` on activity nodes. | | `flow_location` | Location, Origin | **TransportModeDistance** | TMD writes `flow_location` on transport nodes it creates. | | `transport` | TransportModeDistance | **TransportDecision** | TDec also writes the `transport` property. | | `nutrient_values` | AttachFoodTags | **NutrientSubdivision**, **Aggregation** | NS and Agg also write nutrient_values. | | `amount` | UWC, IAE | **MatrixCalculation** | MC writes `amount` on root during final phase. | **Recommendation**: Cross-reference with the complete writer map we built for MR !1659 (in the "Properties with multiple producers" table). The canonical source is `GFMOutput.sets_properties` on each worker class — run the script from the MR to regenerate. ## 2. Glossary sub-keys are a good idea — port to RETE? Your `glossary_tags:food_tags` / `glossary_tags:perishability` / `glossary_tags:conservation` virtual sub-keys are clever. Our RETE conditions treat `glossary_tags` as a single property, which means we can't express "wait for perishability tags specifically" without a PropertyTypeCondition checking the value. Your approach is more precise. However, the sub-key pattern has a fragility: it requires manual sync between the registry and the actual GFM code. If someone adds a new glossary_tags writer without adding a sub-key, downstream GFMs silently stop waiting for it. **Suggestion**: Consider adding a startup validation test that verifies `PROP_WRITERS` entries match `writes_properties` declarations on each GFM (you have this in your plan but it's not yet implemented on the branch). ## 3. WriterScheduling captures information our RETE conditions encode differently Your `WriterScheduling` enum (`same_node`, `parent_node`, `ancestor_node`, `child_node`, `descendant_node`, `root_node`) is a clean way to express where to look for scheduling state. Our RETE equivalent is: | Your enum | Our RETE equivalent | |-----------|-------------------| | `same_node` | Default (no `on_related`) | | `parent_node` | `PropertyTypeCondition(on_related=RelatedNodeCondition(relationship='parent', depth=1))` | | `ancestor_node` | `PropertyTypeCondition(on_related=RelatedNodeCondition(relationship='parent'))` with unbounded depth | | `child_node` | `RelatedNodeCondition(relationship='sub_node', depth=1)` | | `descendant_node` | `RelatedNodeCondition(relationship='sub_node')` | | `root_node` | `GFMCondition(run_when_queue_empty=True)` | Your representation is more explicit and easier to reason about. If you migrate to RETE later (per your `rete_migration_plan.md`), this enum could inform the `PropertySettledCondition` design. ## 4. Dependency edges we discovered that aren't in your branch Our `GFMDependencyGraph.build_dependencies()` computed 35 edges from 4 rule types. Some edges that may not be covered by your current `can_run_now()` refactoring: | Edge | Via | Why it matters | |------|-----|----------------| | NutrientSubdivision → Processing | explicit `depends_on_gfms` | Processing must wait for NS to finish adding subdivision children | | AttachFoodTags → Greenhouse | `product_name` | GH reads parent product_name via PTC | | AttachFoodTags → LinkFoodCategories | `product_name` | LFC uses CrossNodePropertyEqualityCondition on grandparent product_name | | InventoryConnector → TransportModeDistance | `production_amount` | TMD reads sub_node production_amount via PTC | | LocationGFM → Greenhouse | `flow_location` | GH reads parent flow_location via PTC | These are implicit edges inferred from property_type_conditions — your `PROP_WRITERS` registry might miss them because they involve cross-node reads (not just writes). ## 5. The "negative future" problem and our solution Your `rete_migration_plan.md` correctly identifies the hard problem: RETE can't detect when a property will **never** be set (sad path). Our RETE branch handles this differently: - **Happy path**: `PropertyTypeCondition` fires reactively when property reaches the right type - **Sad path**: `run_when_queue_empty=True` GFMs (Aggregation, MatrixCalculation, Perishability, etc.) run only when the reactive queue is fully drained, guaranteeing all producers have either written or been skipped - **Structure gates**: `may_add_children`/`may_remove_edges` flags on GFMOutput force sequential execution for structure-modifying GFMs This means we don't need `reschedule_counter` timeouts — the final-phase mechanism handles it. If you're keeping the polling model, consider adopting a similar "phases" approach: reactive GFMs run first, final-phase GFMs run only when no reactive GFMs are pending. ## 6. Performance insight: can_run_now is not the bottleneck From our benchmark profiling (MR !1659): ``` Wall clock: 5.26s (93-node fixture) GFM execution: 0.57s (10.8%) RETE/orchestration: 4.69s (89.2%) ← this is the bottleneck ``` At 2,000 ingredients: 34.5s wall clock, 0 reschedules, 4,010 GFM runs = 4,010 iterations. The `can_run_now()` optimization will improve code quality and maintainability (which is valuable), but the performance bottleneck is the orchestration loop itself — not the scheduling checks. If you're also targeting performance, the higher-leverage targets are: 1. **Reduce orchestrator iterations** — batch multiple independent activations per iteration 2. **RETE evaluation cost** — delta-based re-evaluation instead of full working memory scan 3. **TMD per-call cost** — 32.7% of GFM time, 0.186s for 11 calls See issue #13 for the full analysis. ## 7. Your branch as a stepping stone to RETE Your `writes_properties` + `PROP_WRITERS` + `FIRST_SUB_NODE_ADDERS` infrastructure is almost exactly what our RETE `GFMOutput.sets_properties` provides. The migration path from your branch to full RETE would be: 1. **Your `PROP_WRITERS`** → `GFMOutput(sets_properties=frozenset({...}))` 2. **Your `get_prop_status()` calls in `can_run_now()`** → `GFMCondition.property_type_conditions` 3. **Your `WriterScheduling` enum** → `PropertyTypeCondition(on_related=RelatedNodeCondition(...))` 4. **Your `FIRST_SUB_NODE_ADDERS`** → `RelatedNodeCondition(relationship='sub_node', quantifier='none')` The data is the same — only the execution model changes (polling → reactive). Your branch is excellent groundwork for this migration. ## References - MR !1659: Same-GFM batching + cross-GFM analysis + topological groups + speedup estimate - Issue #13: Higher-leverage performance targets - `core/core/orchestrator/rete/dependency_graph.py`: 35-edge dependency graph - `core/core/orchestrator/rete/conditions.py`: Declarative condition types

issue