Draft: feat: ruby parser
Example Ruby Parser
This MR outlines how ast-grep
can be utilized to extract code structure and relationships from Ruby source code.
Workflow
The analysis process follows these steps:
-
Initialization:
- A
RubyAnalyzer
instance is created, holding the source code, file path, the parsed Abstract Syntax Tree (AST) as anAstGrep
object, and preloadedast-grep
rules.
- A
-
Rule Execution:
- The analyzer executes the loaded
ast-grep
rules against the code's AST viarun_rules
(defined inrules.rs
). - This generates a raw list of
MatchInfo
structs. EachMatchInfo
represents a location where a rule pattern matched the code, containing details like the matched text, location (Range
), the ID of the rule that matched, and any captured meta-variables (EnvNode
).
- The analyzer executes the loaded
-
Indexing:
- The
build_fqn_and_node_indices
function (inindexing.rs
) performs a single traversal of the AST to construct two essential indices:-
node_index_map
: AHashMap<ByteRange, Node>
enabling fast (O(1)) lookup of AST nodes by their byte range. This is encapsulated within aNodeIndex
struct for ease of use. -
node_fqn_map
: AHashMap<ByteRange, (Node, Arc<Vec<String>>)>
mapping the byte range of definition name nodes (like classes, modules, methods) to both the node itself and its precomputed Fully Qualified Name (FQN).
-
- The
-
Definition Analysis:
-
find_definitions
(indefinitions.rs
) processes theMatchInfo
list. - For matches classified as definitions (
RubyMatchKind::Definition
), it leverages thenode_fqn_map
andNodeIndex
to efficiently fetch the definition's name node and its precomputed FQN. - The results are structured into
DefinitionMatch
structs.
-
-
Reference Analysis (Graph-Based):
-
Graph Construction:
FlowGraph::build
(inflow_graph.rs
) builds a data flow graph:- It utilizes
build_assignment_map
(fromassignment_tracer.rs
) to identify all assignment matches (RubyMatchKind::Assignment
). - It employs
LineMatchIndex::build
(inflow_graph.rs
) to index matches by line number, facilitating spatial lookups. - It relies on the
NodeIndex
for accessing AST nodes. - Graph nodes correspond to
MatchInfo
instances. Edges signify potential data flow paths (e.g., a reference flows to its preceding assignment; an assignment flows to the match representing its assigned value). -
Crucially, during construction, it precomputes and caches the FQN of the origin of the value assigned in assignment nodes. This is done using
get_assignment_value_origin_fqn
(fromassignment_tracer.rs
) and theNodeIndex
.
- It utilizes
-
Tracing:
find_references
(incompute_references.rs
) iterates through reference matches (RubyMatchKind::Reference
):- For each reference, it initiates a trace starting from the reference's corresponding node within the
FlowGraph
usingtrace_from_graph
. - The trace traverses the graph edges (e.g., from a reference node to an assignment node, then from the assignment node to its value node).
- If the trace encounters an assignment node, it retrieves the cached FQN of the assignment's value origin directly from the
FlowGraph
's cache (get_cached_assignment_fqn
). - If the original reference was part of a method call (e.g.,
variable.method_name
), the method name is appended to the resolved origin FQN. -
Fallbacks: If graph tracing fails to resolve an FQN, alternative strategies are employed: using
extract_fqn_from_call_node
orextract_ruby_fqn
(fromfqn.rs
) on the reference node itself, or parsing constant paths directly (e.g.,Module::Class
). - The outcome is packaged into a
ReferenceMatch
struct, associating the originalMatchInfo
with the resolvedorigin_fqn
.
- For each reference, it initiates a trace starting from the reference's corresponding node within the
-
Graph Construction:
-
Result Aggregation:
- The
RubyAnalyzer
gathers theDefinitionMatch
andReferenceMatch
results. - These are converted into final
DefinitionInfo
andReferenceInfo
structs (using string representations for FQNs). - The aggregated results are returned within an
AnalysisResult
object.
- The
graph TD
A[Start: Code + Rules] --> B["RubyAnalyzer"];
B --> C{Run ast-grep Rules};
C --> D[Raw MatchInfo List];
D --> E{"Build Indices (NodeIndex, FQN Map)"};
E --> F[Indexed Data];
F --> G{"Find Definitions (using FQN Map)"};
G --> H[DefinitionInfo List];
F --> I{"Build FlowGraph (using Assignments, LineIndex, NodeIndex)"};
I --> J[FlowGraph + Cached FQNs];
J --> K{"Find References (Trace on FlowGraph)"};
K --> L[ReferenceInfo List];
M[AnalysisResult] --> N["End"];
H --> M;
L --> M;
Key Concepts
-
RubyAnalyzer
(analyzer.rs
): The central struct that orchestrates the analysis. It holds the source code, AST (AstGrep
), rules, and exposes methods likematches
(for raw matches) andanalyze
(for the full analysis). -
Rules (
.yaml
files): Define what to find usingast-grep
patterns. They are categorized (e.g.,definitions.yaml
,references.yaml
,assignment.yaml
) and combined dynamically. Rules determine theRubyMatchKind
(e.g.,Assignment
,Definition
,Reference
,Other
) for each match. -
MatchInfo
(rules.rs
): Represents a single, raw match found byast-grep
. Contains the matched text, location (Range
), rule ID, captured meta-variables (EnvNode
), and its assignedRubyMatchKind
. -
NodeIndex
(indexing.rs
): A wrapper around aHashMap<ByteRange, Node>
, providing efficient O(1) lookup of AST nodes using their byte offsets (ByteRange
). Built once during indexing. -
Fully Qualified Name (FQN):
-
Representation: Defined by the
Fqn
struct (inassignment_tracer.rs
), holding anArc<Vec<String>>
representing the name parts (e.g.,["Module", "Class", "method"]
). -
Extraction (
fqn.rs
): Contains logic to determine FQNs:-
extract_ruby_fqn
: Determines the FQN by walking up the AST from a given node to identify enclosing module, class, and method scopes. -
extract_fqn_from_call_node
: Specifically handles method call structures (e.g.,a.b.c
,A::B.c
) to resolve the receiver path and method name. -
extract_enclosing_module_or_class_fqn
: Retrieves only the module/class scope FQN for a node.
-
-
Precomputation (
indexing.rs
): FQNs for definition nodes are computed during the initialbuild_fqn_and_node_indices
traversal and stored in thenode_fqn_map
. -
Assignment Value Origin FQN (
assignment_tracer.rs
):get_assignment_value_origin_fqn
uses theNodeIndex
and FQN extraction functions to find the FQN of the source of the value being assigned. For example, inx = MyClass.new
, it resolves the FQN to["MyClass", "new"]
. This is vital for tracing references through assignments.
-
Representation: Defined by the
-
FlowGraph
(flow_graph.rs
):- Represents data flow relationships between
MatchInfo
instances. - Nodes correspond to matches; edges indicate flow direction (e.g., reference -> preceding assignment, assignment -> assigned value).
- Built using assignment matches (via
build_assignment_map
), theLineMatchIndex
, and theNodeIndex
. -
Crucially, it caches the precomputed FQN of the origin of assigned values (
get_cached_assignment_fqn
) associated with assignment nodes. This caching significantly speeds up reference tracing by avoiding redundant FQN computations. - Used by
compute_references.rs
(specificallytrace_from_graph
) to follow data flow paths from a reference match back to its potential origin(s).
- Represents data flow relationships between
-
LineMatchIndex
(flow_graph.rs
): An auxiliary index mapping line numbers to theMatchInfo
instances spanning those lines. Used for potentially faster spatial lookups duringFlowGraph
construction, although directByteRange
-based lookups viaNodeIndex
are often sufficient. -
DefinitionMatch
/DefinitionInfo
(definitions.rs
,analyzer.rs
): Represents a successfully identified definition. Links the originalMatchInfo
with its resolved FQN (obtained efficiently from the precomputednode_fqn_map
).DefinitionInfo
is the final, serializable version. -
ReferenceMatch
/ReferenceInfo
(compute_references.rs
,analyzer.rs
): Represents a successfully resolved reference. Links the originalMatchInfo
with the best-effortorigin_fqn
determined primarily throughFlowGraph
tracing (leveraging cached assignment value FQNs) and fallback mechanisms.ReferenceInfo
is the final, serializable version. -
Assignment Helpers (
assignment_tracer.rs
): This module provides helper functions related to assignments, primarily used during graph building and tracing:-
build_assignment_map
: Creates a map where keys are variable names and values are sorted lists of their corresponding assignmentMatchInfo
s. -
find_assignment_before_pos
: Given a variable name and a position, efficiently finds the last assignmentMatchInfo
for that variable occurring before the position (using binary search on the map). -
get_assignment_value_origin_fqn
: Determines the FQN of the value being assigned (detailed in the FQN section). - Defines shared types like
Fqn
,TraceStep
, andTraceReason
used in the tracing process.
-
graph LR
A["RubyAnalyzer"] --> B["Indexing (build_fqn_and_node_indices)"]
A --> C["Definitions (find_definitions)"]
A --> D["References (find_references)"]
D --> E["FlowGraph (build, trace)"]
E --> F["Assignment Helpers"]
E --> G["FQN Helpers"]
B --> C
B --> D
F --> E
G --> E
C --> H["DefinitionInfo"]
D --> I["ReferenceInfo"]
H --> J["AnalysisResult"]
I --> J
Edited by Michael Angelo Rivera