Suggestions from RETE parallel analysis for can_run_now refactor (improve_can_run_conditions_v5)
## Context
@yannick — This issue summarizes findings from the RETE parallel execution branch (`feature/rete-parallel-gfm-execution`, MR !1659) that are directly relevant to your `improve_can_run_conditions_v5` work. Both branches solve the same fundamental problem — replacing ad-hoc GFM scheduling with declarative property dependencies — but from opposite directions: yours improves the polling model bottom-up, ours replaced it top-down with RETE. Here's what we learned that might help.
## 1. Missing writers in PROP_WRITERS
Your registry has some incomplete writer lists. These cause subtle bugs where a GFM thinks a property is settled when another writer is still pending.
| Property | Your writers | Missing writers | Impact |
|----------|-------------|-----------------|--------|
| `product_name` | MatchProductName | **AttachFoodTags**, **LinkTermToActivityNode** | AFT overwrites product_name (list→NamesProp); LTAN also writes it. If a downstream GFM checks `get_prop_status("product_name")` and only MPN is tracked, it'll return `settled` too early. |
| `environmental_flows` | MatrixCalculation | **InventoryConnector** | IC writes environmental_flows on activity nodes first; MC overwrites on root. Missing IC means downstream GFMs may see IC's output as "settled" while MC is still pending. |
| `production_amount` | UnitWeightConversion | **InventoryConnector** | IC also writes `production_amount` on activity nodes. |
| `flow_location` | Location, Origin | **TransportModeDistance** | TMD writes `flow_location` on transport nodes it creates. |
| `transport` | TransportModeDistance | **TransportDecision** | TDec also writes the `transport` property. |
| `nutrient_values` | AttachFoodTags | **NutrientSubdivision**, **Aggregation** | NS and Agg also write nutrient_values. |
| `amount` | UWC, IAE | **MatrixCalculation** | MC writes `amount` on root during final phase. |
**Recommendation**: Cross-reference with the complete writer map we built for MR !1659 (in the "Properties with multiple producers" table). The canonical source is `GFMOutput.sets_properties` on each worker class — run the script from the MR to regenerate.
## 2. Glossary sub-keys are a good idea — port to RETE?
Your `glossary_tags:food_tags` / `glossary_tags:perishability` / `glossary_tags:conservation` virtual sub-keys are clever. Our RETE conditions treat `glossary_tags` as a single property, which means we can't express "wait for perishability tags specifically" without a PropertyTypeCondition checking the value. Your approach is more precise.
However, the sub-key pattern has a fragility: it requires manual sync between the registry and the actual GFM code. If someone adds a new glossary_tags writer without adding a sub-key, downstream GFMs silently stop waiting for it.
**Suggestion**: Consider adding a startup validation test that verifies `PROP_WRITERS` entries match `writes_properties` declarations on each GFM (you have this in your plan but it's not yet implemented on the branch).
## 3. WriterScheduling captures information our RETE conditions encode differently
Your `WriterScheduling` enum (`same_node`, `parent_node`, `ancestor_node`, `child_node`, `descendant_node`, `root_node`) is a clean way to express where to look for scheduling state. Our RETE equivalent is:
| Your enum | Our RETE equivalent |
|-----------|-------------------|
| `same_node` | Default (no `on_related`) |
| `parent_node` | `PropertyTypeCondition(on_related=RelatedNodeCondition(relationship='parent', depth=1))` |
| `ancestor_node` | `PropertyTypeCondition(on_related=RelatedNodeCondition(relationship='parent'))` with unbounded depth |
| `child_node` | `RelatedNodeCondition(relationship='sub_node', depth=1)` |
| `descendant_node` | `RelatedNodeCondition(relationship='sub_node')` |
| `root_node` | `GFMCondition(run_when_queue_empty=True)` |
Your representation is more explicit and easier to reason about. If you migrate to RETE later (per your `rete_migration_plan.md`), this enum could inform the `PropertySettledCondition` design.
## 4. Dependency edges we discovered that aren't in your branch
Our `GFMDependencyGraph.build_dependencies()` computed 35 edges from 4 rule types. Some edges that may not be covered by your current `can_run_now()` refactoring:
| Edge | Via | Why it matters |
|------|-----|----------------|
| NutrientSubdivision → Processing | explicit `depends_on_gfms` | Processing must wait for NS to finish adding subdivision children |
| AttachFoodTags → Greenhouse | `product_name` | GH reads parent product_name via PTC |
| AttachFoodTags → LinkFoodCategories | `product_name` | LFC uses CrossNodePropertyEqualityCondition on grandparent product_name |
| InventoryConnector → TransportModeDistance | `production_amount` | TMD reads sub_node production_amount via PTC |
| LocationGFM → Greenhouse | `flow_location` | GH reads parent flow_location via PTC |
These are implicit edges inferred from property_type_conditions — your `PROP_WRITERS` registry might miss them because they involve cross-node reads (not just writes).
## 5. The "negative future" problem and our solution
Your `rete_migration_plan.md` correctly identifies the hard problem: RETE can't detect when a property will **never** be set (sad path). Our RETE branch handles this differently:
- **Happy path**: `PropertyTypeCondition` fires reactively when property reaches the right type
- **Sad path**: `run_when_queue_empty=True` GFMs (Aggregation, MatrixCalculation, Perishability, etc.) run only when the reactive queue is fully drained, guaranteeing all producers have either written or been skipped
- **Structure gates**: `may_add_children`/`may_remove_edges` flags on GFMOutput force sequential execution for structure-modifying GFMs
This means we don't need `reschedule_counter` timeouts — the final-phase mechanism handles it. If you're keeping the polling model, consider adopting a similar "phases" approach: reactive GFMs run first, final-phase GFMs run only when no reactive GFMs are pending.
## 6. Performance insight: can_run_now is not the bottleneck
From our benchmark profiling (MR !1659):
```
Wall clock: 5.26s (93-node fixture)
GFM execution: 0.57s (10.8%)
RETE/orchestration: 4.69s (89.2%) ← this is the bottleneck
```
At 2,000 ingredients: 34.5s wall clock, 0 reschedules, 4,010 GFM runs = 4,010 iterations.
The `can_run_now()` optimization will improve code quality and maintainability (which is valuable), but the performance bottleneck is the orchestration loop itself — not the scheduling checks. If you're also targeting performance, the higher-leverage targets are:
1. **Reduce orchestrator iterations** — batch multiple independent activations per iteration
2. **RETE evaluation cost** — delta-based re-evaluation instead of full working memory scan
3. **TMD per-call cost** — 32.7% of GFM time, 0.186s for 11 calls
See issue #13 for the full analysis.
## 7. Your branch as a stepping stone to RETE
Your `writes_properties` + `PROP_WRITERS` + `FIRST_SUB_NODE_ADDERS` infrastructure is almost exactly what our RETE `GFMOutput.sets_properties` provides. The migration path from your branch to full RETE would be:
1. **Your `PROP_WRITERS`** → `GFMOutput(sets_properties=frozenset({...}))`
2. **Your `get_prop_status()` calls in `can_run_now()`** → `GFMCondition.property_type_conditions`
3. **Your `WriterScheduling` enum** → `PropertyTypeCondition(on_related=RelatedNodeCondition(...))`
4. **Your `FIRST_SUB_NODE_ADDERS`** → `RelatedNodeCondition(relationship='sub_node', quantifier='none')`
The data is the same — only the execution model changes (polling → reactive). Your branch is excellent groundwork for this migration.
## References
- MR !1659: Same-GFM batching + cross-GFM analysis + topological groups + speedup estimate
- Issue #13: Higher-leverage performance targets
- `core/core/orchestrator/rete/dependency_graph.py`: 35-edge dependency graph
- `core/core/orchestrator/rete/conditions.py`: Declarative condition types
issue