feat(ruby): implement ruby definitions
What does this MR do?
This MR builds directly on feat(parser): setup initial library (!14 - merged) to implement definitions extraction for Ruby. Please read the merge request in full before reviewing this one.
Most importantly, this merge request sets up a general code path for other parsers to follow.
Related issues
- closes [KG] (Ruby) Parse Definitions (#13 - closed)
- [design] Spike: Expected FQN Data Structure (#19 - closed)
- [KG] Create Initial Library Code (#16 - closed)
- Proposal: 5+ phases of work for the knowledge g... (knowledge-graph#1 - closed)
Ruby Parser Documentation (Copy)
This document provides a high-level overview of the Ruby parser implementation, including the parsing flow, data structures, and how definitions are extracted with Fully Qualified Names (FQNs).
Table of Contents
- High-Level Overview
- Coverage
- Architecture Components
- Data Flow
- FQN Map and Node Index Building
- Definition Extraction
- Variable Captures
High-Level Overview
The Ruby parser extracts structured information about Ruby code definitions (classes, modules, methods, etc.) and computes their Fully Qualified Names (FQNs) with rich metadata. It combines AST parsing, rule-based pattern matching, and semantic analysis.
Coverage
See coverage.md for what is covered and what is not.
Files and Organization
src/ruby/
├── README.md # This documentation
├── mod.rs # Module exports
├── ruby_ast.rs # YAML rules loading and configuration
├── fqn.rs # FQN computation and metadata
├── definitions.rs # Definition extraction and classification
├── analyzer.rs # Main API and result aggregation
├── COVERAGE.md # Feature coverage documentation
├── rules/
│ └── definitions.yaml # YAML rule definitions
└── fixtures/ # Test fixtures
├── comprehensive_definitions.rb # Complete definition type coverage
├── sample.rb # Basic authentication service example
├── monolith_sample_1.rb # Real-world Rails controller
├── references_test_rails.rb # Rails patterns
└── references_test_tracing.rb # Reference tracing examples
Architecture Components
graph TB
A[Ruby Source Code] --> B[Tree-sitter Parser]
B --> C[AST]
C --> D[YAML Rules Engine]
C --> E[FQN Map Builder]
D --> F[Rule Matches]
E --> G[Node Index Map]
E --> H[Ruby FQN Map]
F --> I[Definition Extractor]
G --> I
H --> I
I --> J[Ruby Analysis Result]
subgraph "Core Components"
K[ruby_ast.rs - Rules & Config]
L[fqn.rs - FQN Computation]
M[definitions.rs - Definition Extraction]
N[analyzer.rs - Main API]
end
K -.-> D
L -.-> E
M -.-> I
N -.-> J
How It Works (High Level)
The Ruby parser operates in two tracks that eventually merge:
Track 1: Structural Analysis (FQN Building)
- AST Traversal - Walk through every node in the parsed Ruby code
- Scope Tracking - Maintain a stack of current scope (modules, classes, methods)
- FQN Generation - For each definition node, build its fully qualified name from the scope stack
- Metadata Collection - Capture AST node types, byte ranges, and scope creation info
- Index Building - Create maps linking byte positions to FQN data
Track 2: Pattern Matching (Rule Processing)
- YAML Rule Application - Run predefined patterns against the AST
- Capture Variable Capture - Extract matched node text and positions
- Match Classification - Identify which definition type each match represents
Convergence: Definition Extraction
- Match Processing - For each rule match, determine the definition type
- FQN Lookup - Use the match's byte position to find the corresponding FQN from Track 1
- Definition Assembly - Combine rule match data with FQN metadata
- Result Aggregation - Organize all definitions into a structured analysis result
The FQN system provides semantic context (where am I in the codebase?) while the rule system provides syntactic identification (what kind of definition is this?). Together they create a complete picture of Ruby code structure.
Component Responsibilities
| Component | Responsibility | Key Types |
|---|---|---|
ruby_ast.rs |
YAML rule loading, rule-to-kind mapping |
RubyMatchKind, RULES_CONFIG
|
fqn.rs |
FQN computation, AST traversal, node indexing |
RubyFqn, RubyFqnMetadata, RubyNodeFqnMap
|
definitions.rs |
Definition extraction, type classification |
DefinitionInfo, DefinitionType, DefinitionExtractor
|
analyzer.rs |
Main API, result aggregation |
RubyAnalyzer, RubyAnalysisResult
|
Data Flow
The Ruby parser follows a multi-stage pipeline:
sequenceDiagram
participant Client
participant Analyzer
participant Parser
participant Rules
participant FQN
participant Definitions
Client->>Analyzer: analyze(matches, parse_result)
Analyzer->>FQN: build_fqn_and_node_indices(ast)
Note over FQN: AST Traversal Phase
FQN->>FQN: compute_fqns_and_index_recursive()
FQN-->>Analyzer: (ruby_node_fqn_map, node_index_map)
Analyzer->>Definitions: find_definitions(matches, ruby_node_fqn_map)
Note over Definitions: Definition Extraction Phase
Definitions->>Definitions: extract_definition_info()
Definitions->>FQN: find_ruby_fqn_for_node()
FQN-->>Definitions: RubyFqn with metadata
Definitions-->>Analyzer: Vec<DefinitionInfo>
Analyzer-->>Client: RubyAnalysisResult
FQN Map and Node Index Building
See fqn.md for what a FQN is and for more details.
Data Structures
RubyFqnMetadata
Contains metadata about each FQN part:
pub struct RubyFqnMetadata {
pub ast_node_kind: String, // "class", "module", "method", etc.
pub byte_range: ByteRange, // Position in source code
}
RubyFqnPart
A single component of an FQN with metadata:
pub type RubyFqnPart = FQNPart<String, RubyFqnMetadata>;
// Example: Class part for "User" class
RubyFqnPart {
node_type: "Class".to_string(),
node_name: "User".to_string(),
metadata: Some(RubyFqnMetadata {
ast_node_kind: "class".to_string(),
byte_range: (45, 49),
})
}
RubyFqn
Complete FQN with all parts:
pub type RubyFqn = Fqn<RubyFqnPart>;
// Example: "Authentication::User::initialize"
RubyFqn {
parts: Arc<Vec<RubyFqnPart>>[
RubyFqnPart { node_type: "Module", node_name: "Authentication", metadata: ... },
RubyFqnPart { node_type: "Class", node_name: "User", metadata: ... },
RubyFqnPart { node_type: "Method", node_name: "initialize", metadata: ... },
]
}
FQN Building Algorithm
The build_fqn_and_node_indices() function traverses the AST and builds two data structures:
graph TD
A[Start: Root AST Node] --> B[Initialize Empty Scope Stack]
B --> C[Traverse Node Recursively]
C --> D{Is Definition Node?}
D -->|Yes| E[Extract Name & Create Metadata]
D -->|No| F[Index Node by Byte Range]
E --> G{Creates New Scope?}
G -->|Yes| H[Push to Scope Stack]
G -->|No| I[Build FQN from Current Scope]
H --> I
I --> J[Store in RubyNodeFqnMap]
J --> K[Process Children]
F --> K
K --> L{More Children?}
L -->|Yes| C
L -->|No| M{Was New Scope?}
M -->|Yes| N[Pop from Scope Stack]
M -->|No| O[Done with Node]
N --> O
O --> P{More Nodes?}
P -->|Yes| C
P -->|No| Q[Return Maps]
Example: FQN Building Process
Given this Ruby code:
module Authentication
class User
def initialize(name)
@name = name
end
def self.find_by_email(email)
# implementation
end
end
end
The FQN building process works as follows:
Step-by-Step Traversal
graph LR
subgraph "Scope Stack Evolution"
A["[]<br/>Empty"] --> B["[Authentication]<br/>Enter Module"]
B --> C["[Authentication, User]<br/>Enter Class"]
C --> D["[Authentication, User, initialize]<br/>Enter Method"]
D --> E["[Authentication, User]<br/>Exit Method"]
E --> F["[Authentication, User, find_by_email]<br/>Enter Singleton Method"]
F --> G["[Authentication, User]<br/>Exit Singleton Method"]
G --> H["[Authentication]<br/>Exit Class"]
H --> I["[]<br/>Exit Module"]
end
Resulting FQN Map Entries
| Byte Range | Definition | FQN Parts | Metadata |
|---|---|---|---|
(7, 21) |
Authentication |
[Module: Authentication] |
{ast_node_kind: "module"} |
(28, 32) |
User |
[Module: Authentication, Class: User] |
{ast_node_kind: "class"} |
(43, 53) |
initialize |
[Module: Authentication, Class: User, Method: initialize] |
{ast_node_kind: "method"} |
(89, 102) |
find_by_email |
[Module: Authentication, Class: User, SingletonMethod: find_by_email] |
{ast_node_kind: "singleton_method"} |
Data Structure Output
// RubyNodeFqnMap entries
ruby_node_fqn_map = HashMap {
(7, 21) => (
name_node, // "Authentication" node
Arc<Vec<RubyFqnPart>>[
RubyFqnPart {
node_type: "Module",
node_name: "Authentication",
metadata: RubyFqnMetadata {
ast_node_kind: "module",
byte_range: (7, 21),
}
}
]
),
(28, 32) => (
name_node, // "User" node
Arc<Vec<RubyFqnPart>>[
RubyFqnPart { /* Authentication module */ },
RubyFqnPart {
node_type: "Class",
node_name: "User",
metadata: RubyFqnMetadata {
ast_node_kind: "class",
byte_range: (28, 32),
}
}
]
),
// ... more entries
}
Definition Extraction
Definition extraction happens after FQN building and uses YAML rule matches to identify and classify definitions.
YAML Rules System
The definitions.yaml file defines patterns for each definition type:
rule:
any:
# Class definition
- kind: class
has:
field: name
kind: constant
pattern: $CLASS_DEF_NAME
# Lambda assignment
- all:
- kind: assignment
- has:
field: left
any:
- kind: constant
pattern: $LAMBDA_CONSTANT_NAME
- has:
field: right
kind: call
has:
field: method
pattern: "lambda"
- pattern: $LAMBDA_DEF
DefinitionExtractor Configuration
Each definition type has a dedicated extractor:
DefinitionExtractor {
definition_type: DefinitionType::Class,
extractor: |env| env.get("CLASS_DEF_NAME").map(|node| node.text.clone()),
meta_vars: vec!["CLASS_DEF_NAME"],
}
DefinitionExtractor {
definition_type: DefinitionType::Lambda,
extractor: |env| {
if env.get("LAMBDA_DEF").is_some() {
// Extract assignment target name
if let Some(constant_node) = env.get("LAMBDA_CONSTANT_NAME") {
Some(constant_node.text.clone())
} else if let Some(var_node) = env.get("LAMBDA_VARIABLE_NAME") {
Some(var_node.text.clone())
} else {
Some("lambda".to_string())
}
} else {
None
}
},
meta_vars: vec!["LAMBDA_DEF", "LAMBDA_CONSTANT_NAME", "LAMBDA_VARIABLE_NAME", ...],
}
How to Add a New Definition Type
To add support for a new Ruby definition type, you need to make changes in 4 places:
1. Add to DefinitionType Enum (definitions.rs)
#[derive(Debug, Clone, PartialEq, Eq, Hash)]
pub enum DefinitionType {
// ... existing types
YourNewType, // Add your new type here
}
2. Define Capture Variables (definitions.rs)
pub mod meta_vars {
// ... existing vars
pub const YOUR_NEW_TYPE_NAME: &str = "YOUR_NEW_TYPE_NAME";
pub const YOUR_NEW_TYPE_EXTRA_INFO: &str = "YOUR_NEW_TYPE_EXTRA_INFO";
}
3. Add YAML Rule Pattern (rules/definitions.yaml)
rule:
any:
# ... existing rules
# Your new definition type
- all:
- kind: your_ast_node_kind # e.g., "assignment", "call", etc.
- has:
field: relevant_field
pattern: $YOUR_NEW_TYPE_NAME
- has:
field: other_field
pattern: $YOUR_NEW_TYPE_EXTRA_INFO
4. Add DefinitionExtractor (definitions.rs)
vec![
// ... existing extractors
DefinitionExtractor {
definition_type: DefinitionType::YourNewType,
extractor: |env| {
env.get("YOUR_NEW_TYPE_NAME").map(|node| {
// Extract the name from the matched AST node
node.text.clone()
})
},
meta_vars: vec!["YOUR_NEW_TYPE_NAME", "YOUR_NEW_TYPE_EXTRA_INFO"],
},
]
5. Add FQN Support (if needed)
Update find_fqn_for_definition() to handle your new type:
let meta_var_name = match def_type {
// ... existing cases
DefinitionType::YourNewType => meta_vars::YOUR_NEW_TYPE_NAME,
};
6. Write Tests
Add tests in definitions.rs or any other relevant file:
#[test]
fn test_your_new_type_definitions() {
test_definition_extraction(
r#"
# Your Ruby code example
your_syntax here
"#,
vec![("expected_name", DefinitionType::YourNewType, "expected::fqn")],
"Your new definition type description",
);
}
Variable Captures
ast-grep stores the capture of each configured Pattern in definitions.yaml in the env field. Each definition captures relevant capture variables:
| Definition Type | Primary Env Vars | Additional Env Vars |
|---|---|---|
| Class | CLASS_DEF_NAME |
- |
| Module | MODULE_DEF_NAME |
- |
| Method | METHOD_DEF_NAME |
- |
| Singleton Method | SINGLETON_METHOD_DEF_NAME |
- |
| Lambda | LAMBDA_DEF |
LAMBDA_CONSTANT_NAME, LAMBDA_VARIABLE_NAME, LAMBDA_INSTANCE_VAR, LAMBDA_CLASS_VAR
|
| Proc |
PROC_DEF, PROC_ASSIGNMENT
|
PROC_CONSTANT_NAME, PROC_VARIABLE_NAME, PROC_INSTANCE_VAR, PROC_CLASS_VAR
|