feat(ts): basic intra-file reference resolution + infra for scope-aware resolution

Review changes
Open in Workspace
Download
Patches
Plain diff

ts-intra-file-references-2 into main Aug 13, 2025

Overview 5
Commits 90
Pipelines 3
Changes 10

What does this MR do and why?

This merge request introduces a foundational framework for scope-aware symbol analysis in the TypeScript parser. It builds directly on the recently introduced TypeScriptExpression representation, providing the necessary infrastructure to understand the lexical context of expressions. This is the critical next step towards implementing high-fidelity reference resolution that can handle complex, real-world code patterns like variable shadowing and data-flow-dependent method calls.

The Problem: Expressions Without Context

The previous work introduced TypeScriptExpression to model what an expression does. However, the parser lacked a robust mechanism to understand where that expression lives. To resolve a symbol like myService.getData(), the parser must answer questions like:

What is myService? Where was it defined?
What is the current lexical scope? Is myService shadowed by a more local definition?
If myService was assigned the result of another expression, what was that expression's return type?

The existing system could not answer these questions reliably, making accurate reference resolution impossible.

The Solution: A Unified Symbol and Scope Analysis Framework

This MR introduces a new architectural layer to provide this missing context. The solution is composed of three key, interacting components:

A Unified TypeScriptSymbol Enum (crates/parser-core/src/typescript/references/types.rs): A new enum was created to serve as a single, canonical representation for all key symbols in the code: definitions, imports, and references. This abstraction allows all symbols, regardless of their type, to be stored and indexed together in a uniform way.
A Scope-Aware ByteRangeSpatialIndex (crates/parser-core/src/typescript/references/lookup.rs): The interval tree, previously used only for expressions, is now used to index all TypeScriptSymbols. This creates a spatial map of the entire codebase's lexical structure. New methods were added to the index to facilitate hierarchical queries:
- find_all_containing(start, end): Finds all scopes that contain a given byte range, ordered from smallest to largest.
- find_immediate_parent(start, end): Finds the single, smallest scope that contains a given range.
- find_immediate_children(start, end): Finds all symbols defined directly within a given scope range. This turns the flat list of symbols into a navigable tree, representing the actual scope hierarchy of the code.
A High-Level Scope Navigation Toolkit (crates/parser-core/src/typescript/references/utils.rs): A suite of new utilities was built on top of the symbol index to provide powerful analysis capabilities. While not all of these are fully integrated into the resolver yet, they form the core of the new framework and are heavily tested.
- crawl_scope_enhanced: Allows a traversal to start at a specific expression and walk outwards through the scope hierarchy (e.g., from a method to its class, and then to the module level), providing access to the parent and children of each scope.
- find_shared_scope_expressions: This utility identifies data-flow patterns within a single scope. It groups expressions by their containing scope and then links variable assignments to their subsequent uses. For example, it can programmatically link x = new Database() to later uses like x.connect().

Consequences and Implementation Details

Integration into the Analyzer (analyzer.rs): The main TypeScriptAnalyzer has been updated to call the new resolve_references function, formally integrating reference resolution into its analysis pipeline. The result, a Vec<TypeScriptReferenceInfo>, is now part of the final TypeScriptAnalysisResult.
A Placeholder Resolver (resolve.rs): A new module, resolve.rs, now houses the main resolution logic. The initial implementation is intentionally naive, using a simple HashMap to perform a global, name-based lookup (resolve_simple_call). This serves as a working placeholder that allows the overall system to be wired together, while deferring the use of the more advanced scope-crawling tools.
Enhanced Assignment Parsing (expression.rs): The expression parser has been made more sophisticated. It now distinguishes between simple assignments (e.g., x = y, const z = 10) and assignments that involve function or constructor calls. This separation allows for more direct analysis of simple data transfers.
Extensive Foundational Testing: A significant portion of this MR is dedicated to adding tests for the new scope navigation infrastructure. The tests in utils.rs and resolve.rs thoroughly validate the correctness of crawl_scope_enhanced and find_shared_scope_expressions against the complex.ts fixture, ensuring the foundation is solid, even though these utilities are not yet fully consumed by the main resolver.

What's Next: The Path to Advanced Reference Resolution

This MR successfully lays the complete architectural groundwork for high-fidelity reference resolution. The current resolver is a simple placeholder, and the next steps will involve replacing it with logic that leverages the new infrastructure.

Handling Shadowing: The naive HashMap lookup will be replaced with a call to crawl_scope_enhanced. To resolve a symbol, the engine will start at the expression's immediate scope and check for a definition. If not found, it will walk up to the parent scope and repeat the process, correctly handling variable shadowing by design.
Implementing Data-Flow Analysis: The find_shared_scope_expressions utility will be used to track variable types. When resolving x.connect(), the engine will first find the expression that assigned to x (e.g., x = new Database()). By resolving the RHS of the assignment, it will determine the type of x and can then correctly resolve .connect() as a method on the Database class.
Resolving this and Class Members: The scope-crawling tools will be used to find the containing class for a given expression. This will enable the correct resolution of this and provide the set of available members for resolving property and method accesses.

Related Issues

#11 (closed)

Testing

Performance Analysis

Performance Checklist

Have you reviewed your memory allocations to ensure you're optimizing correctly? Are you cloning or copying unnecessary data?
Have you profiled with cargo bench or criterion to measure performance impact?
Are you using zero-copy operations where possible (e.g., &str instead of String, slice references)?
Have you considered using Cow<'_, T> for conditional ownership to avoid unnecessary clones?
Are iterator chains and lazy evaluation being used effectively instead of intermediate collections?
Are you reusing allocations where possible (e.g., Vec::clear() and reuse vs new allocation)?
Have you considered using SmallVec or similar for small, stack-allocated collections?
Are async operations properly structured to avoid blocking the executor?
Have you reviewed unsafe code blocks for both safety and performance implications?
Are you using appropriate data structures (e.g., HashMap vs BTreeMap vs IndexMap)?
Have you considered compile-time optimizations (e.g., const fn, generics instead of trait objects)?
Are debug assertions (debug_assert!) used instead of runtime checks where appropriate?

Edited Aug 15, 2025 by Michael Usachenko

Merge request reports

Assignee Loading

Reviewers Loading

Request review from

Time tracking Loading