[proposal] Spike: Resolving Definition FQNs with a ORM-like rules engine

Background

Since starting phase 1 of the Knowledge Graph project, we've already built 4 parsers (Ruby, JS, Python, Kotlin) that more or less, use the same pattern to resolve FQNs - but go about it via different implementation details.

As the One Parser project matures, and more languages are supported, a lot of code will end up duplicated for no good reason: increasing the probability of error(s), performance and best practices will drift between implementations, and a lot of subtle details will be created for future package maintainers to sift through.

Given the tree structure of ASTs, the correct way to construct FQNs is via DFS. You visit a particular node in the tree, like class A { function b(){} }, create a new scope on class A, and then visit all that scope's children function b(){}, before terminating the parent scope. You'd end up with two FQN's in this case file.A, and file.A.b.

The logic for traversal is language-agnostic, and the rules for deciding whether or not a node in the tree creates a new scope either requires looking at the current node e.g class_definition -> class A {}, or at an ancestor node const foo = () => {} -> parent_of("arrow_function") == "variable_declarator". These rules require language-specific inputs, but are themselves language-agnostic, as their "language" is the language of tree traversal.

Proposal

A DSL-like developer experience that substantially reduces the time it takes to resolve FQNs for Definitions by abstracting away most tree-traversal details, and leaving the core question of "Does this node create a new scope?" to a set of rules in a rule engine that always resolves to True or False.

For any given language, there would be two steps necessary to implement this experience:

A corpus of scope creating node kinds e.g {class_definition, arrow_function, ...}
A list of rules, e.g ScopeRule[]

In addition, there would be some "hidden" assumptions baked into the execution of list of scope rules:

If the list of scope rules is empty, then by default, any node kind in the corpus automatically creates a new scope.
If a node kind is in the corpus, but doesn't have a corresponding ScopeRule, it will automatically create a scope.
If a scope rule deeper in the rules list contradicts another, e.g rules[0] is contradicted by rules[5] because they both apply a rule to arrow_function, then rules[5] automatically takes precedence.
If a rule evaluates to True, then a "name extraction" callback will be automatically applied to yield the name portion of for that FQNPart.
If a rule evaluates to True, then a "range extraction" callback will be automatically applied to yield the Range portion for that FQNPart. This is important in edge cases like including the decorator line in a function FQN for languages like Python.

For each scope rule the interior logic would look something like this:

Simple example:

ScopeRule {
    predicate: Box::new(
        any(vec!["class_declaration", "method_definition", "function_declaration", "generator_function_declaration"])
    ),
    // name_extractor doesn't need to be included (applied by default)
    // range_extractor doesn't need to be included (applied by default)
}

More advanced example:

ScopeRule {
    predicate: Box::new(
        any(vec!["class", "arrow_function", "function_expression", "generator_function"])
            .and(parent_field_any(vec![
                ("variable_declarator", "name"),
                ("field_definition", "property"),
            ]))
    ),
    name_extractor: Box::new(
        extract_from_parent_fields(vec![
            ("variable_declarator", "name"),
            ("field_definition", "property"),
        ])
    ),
}

These two examples cover 100% of the test cases for the merged JS definitions parser. In fact, you wouldn't even need the simple example at all, because by default, node kinds in the corpus will create new scopes.

Implications

When considering the future of the One Parser project - our team has already discussed unifying the "boring stuff" like YAML loading, etc for language-specific parser config, and I believe this is the final step in the direction that makes new language implementations a single file.

As discussed earlier in the issue, the vast majority of scoping cases will be covered by the logical progression of:

Evaluate a predicate, and determine whether or not a new scope should be created.
If a new scope is created, extract the name for the FQNPart.
If a new scope is created, extract the Range for the FQNPart.

All 3 parts are configurable, and for the 2) and 3) - you'd be able to set a default "evaluation" to cover simple name and range assignments by default.

There are, of course, lingering questions (in no particular order) about:

Performance
Expanding to other FQN flavors like Imports
Removing the current DefinitionExtractor pattern in favor of the ScopeRule[] pattern
Removing ast-grep all together
The "right" standard library (helpers like parent_field_any, extract_from_parent_fields, etc)
Any edge cases where this fails, so we can iterate on the spec
Usage of this design pattern in the linking phase of the KG Indexer

Next Steps

I'd like to collect some feedback on this, but I'm going to whip up a very crude exploratory MR for this shortly. We'll quickly know it's a good pattern if we can replace multiple definition parsers in one go with this engine.

Edited Jun 18, 2025 by Michael Usachenko