Skip to content

feat(ruby): implement ruby definitions

What does this MR do?

This MR builds directly on feat(parser): setup initial library (!14 - merged) to implement definitions extraction for Ruby. Please read the merge request in full before reviewing this one.

Most importantly, this merge request sets up a general code path for other parsers to follow.

Related issues


Ruby Parser Documentation (Copy)

This document provides a high-level overview of the Ruby parser implementation, including the parsing flow, data structures, and how definitions are extracted with Fully Qualified Names (FQNs).

Table of Contents

High-Level Overview

The Ruby parser extracts structured information about Ruby code definitions (classes, modules, methods, etc.) and computes their Fully Qualified Names (FQNs) with rich metadata. It combines AST parsing, rule-based pattern matching, and semantic analysis.

Coverage

See coverage.md for what is covered and what is not.

Files and Organization

src/ruby/
├── README.md                          # This documentation
├── mod.rs                             # Module exports
├── ruby_ast.rs                        # YAML rules loading and configuration
├── fqn.rs                            # FQN computation and metadata
├── definitions.rs                     # Definition extraction and classification  
├── analyzer.rs                       # Main API and result aggregation
├── COVERAGE.md                        # Feature coverage documentation
├── rules/
│   └── definitions.yaml              # YAML rule definitions
└── fixtures/                         # Test fixtures
    ├── comprehensive_definitions.rb   # Complete definition type coverage
    ├── sample.rb                     # Basic authentication service example
    ├── monolith_sample_1.rb          # Real-world Rails controller
    ├── references_test_rails.rb      # Rails patterns
    └── references_test_tracing.rb    # Reference tracing examples

Architecture Components

graph TB
    A[Ruby Source Code] --> B[Tree-sitter Parser]
    B --> C[AST]
    C --> D[YAML Rules Engine]
    C --> E[FQN Map Builder]
    D --> F[Rule Matches]
    E --> G[Node Index Map]
    E --> H[Ruby FQN Map]
    F --> I[Definition Extractor]
    G --> I
    H --> I
    I --> J[Ruby Analysis Result]
    
    subgraph "Core Components"
        K[ruby_ast.rs - Rules & Config]
        L[fqn.rs - FQN Computation]
        M[definitions.rs - Definition Extraction]
        N[analyzer.rs - Main API]
    end
    
    K -.-> D
    L -.-> E
    M -.-> I
    N -.-> J

How It Works (High Level)

The Ruby parser operates in two tracks that eventually merge:

Track 1: Structural Analysis (FQN Building)

  1. AST Traversal - Walk through every node in the parsed Ruby code
  2. Scope Tracking - Maintain a stack of current scope (modules, classes, methods)
  3. FQN Generation - For each definition node, build its fully qualified name from the scope stack
  4. Metadata Collection - Capture AST node types, byte ranges, and scope creation info
  5. Index Building - Create maps linking byte positions to FQN data

Track 2: Pattern Matching (Rule Processing)

  1. YAML Rule Application - Run predefined patterns against the AST
  2. Capture Variable Capture - Extract matched node text and positions
  3. Match Classification - Identify which definition type each match represents

Convergence: Definition Extraction

  1. Match Processing - For each rule match, determine the definition type
  2. FQN Lookup - Use the match's byte position to find the corresponding FQN from Track 1
  3. Definition Assembly - Combine rule match data with FQN metadata
  4. Result Aggregation - Organize all definitions into a structured analysis result

The FQN system provides semantic context (where am I in the codebase?) while the rule system provides syntactic identification (what kind of definition is this?). Together they create a complete picture of Ruby code structure.

Component Responsibilities

Component Responsibility Key Types
ruby_ast.rs YAML rule loading, rule-to-kind mapping RubyMatchKind, RULES_CONFIG
fqn.rs FQN computation, AST traversal, node indexing RubyFqn, RubyFqnMetadata, RubyNodeFqnMap
definitions.rs Definition extraction, type classification DefinitionInfo, DefinitionType, DefinitionExtractor
analyzer.rs Main API, result aggregation RubyAnalyzer, RubyAnalysisResult

Data Flow

The Ruby parser follows a multi-stage pipeline:

sequenceDiagram
    participant Client
    participant Analyzer
    participant Parser
    participant Rules
    participant FQN
    participant Definitions
    
    Client->>Analyzer: analyze(matches, parse_result)
    Analyzer->>FQN: build_fqn_and_node_indices(ast)
    
    Note over FQN: AST Traversal Phase
    FQN->>FQN: compute_fqns_and_index_recursive()
    FQN-->>Analyzer: (ruby_node_fqn_map, node_index_map)
    
    Analyzer->>Definitions: find_definitions(matches, ruby_node_fqn_map)
    
    Note over Definitions: Definition Extraction Phase
    Definitions->>Definitions: extract_definition_info()
    Definitions->>FQN: find_ruby_fqn_for_node()
    FQN-->>Definitions: RubyFqn with metadata
    Definitions-->>Analyzer: Vec<DefinitionInfo>
    
    Analyzer-->>Client: RubyAnalysisResult

FQN Map and Node Index Building

See fqn.md for what a FQN is and for more details.

Data Structures

RubyFqnMetadata

Contains metadata about each FQN part:

pub struct RubyFqnMetadata {
    pub ast_node_kind: String,      // "class", "module", "method", etc.
    pub byte_range: ByteRange,       // Position in source code
}

RubyFqnPart

A single component of an FQN with metadata:

pub type RubyFqnPart = FQNPart<String, RubyFqnMetadata>;

// Example: Class part for "User" class
RubyFqnPart {
    node_type: "Class".to_string(),
    node_name: "User".to_string(),
    metadata: Some(RubyFqnMetadata {
        ast_node_kind: "class".to_string(),
        byte_range: (45, 49),
    })
}

RubyFqn

Complete FQN with all parts:

pub type RubyFqn = Fqn<RubyFqnPart>;

// Example: "Authentication::User::initialize"
RubyFqn {
    parts: Arc<Vec<RubyFqnPart>>[
        RubyFqnPart { node_type: "Module", node_name: "Authentication", metadata: ... },
        RubyFqnPart { node_type: "Class", node_name: "User", metadata: ... },
        RubyFqnPart { node_type: "Method", node_name: "initialize", metadata: ... },
    ]
}

FQN Building Algorithm

The build_fqn_and_node_indices() function traverses the AST and builds two data structures:

graph TD
    A[Start: Root AST Node] --> B[Initialize Empty Scope Stack]
    B --> C[Traverse Node Recursively]
    
    C --> D{Is Definition Node?}
    D -->|Yes| E[Extract Name & Create Metadata]
    D -->|No| F[Index Node by Byte Range]
    
    E --> G{Creates New Scope?}
    G -->|Yes| H[Push to Scope Stack]
    G -->|No| I[Build FQN from Current Scope]
    
    H --> I
    I --> J[Store in RubyNodeFqnMap]
    J --> K[Process Children]
    
    F --> K
    K --> L{More Children?}
    L -->|Yes| C
    L -->|No| M{Was New Scope?}
    
    M -->|Yes| N[Pop from Scope Stack]
    M -->|No| O[Done with Node]
    N --> O
    
    O --> P{More Nodes?}
    P -->|Yes| C
    P -->|No| Q[Return Maps]

Example: FQN Building Process

Given this Ruby code:

module Authentication
  class User
    def initialize(name)
      @name = name
    end
    
    def self.find_by_email(email)
      # implementation
    end
  end
end

The FQN building process works as follows:

Step-by-Step Traversal

graph LR
    subgraph "Scope Stack Evolution"
        A["[]<br/>Empty"] --> B["[Authentication]<br/>Enter Module"]
        B --> C["[Authentication, User]<br/>Enter Class"] 
        C --> D["[Authentication, User, initialize]<br/>Enter Method"]
        D --> E["[Authentication, User]<br/>Exit Method"]
        E --> F["[Authentication, User, find_by_email]<br/>Enter Singleton Method"]
        F --> G["[Authentication, User]<br/>Exit Singleton Method"]
        G --> H["[Authentication]<br/>Exit Class"]
        H --> I["[]<br/>Exit Module"]
    end

Resulting FQN Map Entries

Byte Range Definition FQN Parts Metadata
(7, 21) Authentication [Module: Authentication] {ast_node_kind: "module"}
(28, 32) User [Module: Authentication, Class: User] {ast_node_kind: "class"}
(43, 53) initialize [Module: Authentication, Class: User, Method: initialize] {ast_node_kind: "method"}
(89, 102) find_by_email [Module: Authentication, Class: User, SingletonMethod: find_by_email] {ast_node_kind: "singleton_method"}

Data Structure Output

// RubyNodeFqnMap entries
ruby_node_fqn_map = HashMap {
    (7, 21) => (
        name_node,  // "Authentication" node
        Arc<Vec<RubyFqnPart>>[
            RubyFqnPart {
                node_type: "Module",
                node_name: "Authentication",
                metadata: RubyFqnMetadata {
                    ast_node_kind: "module",
                    byte_range: (7, 21),
                }
            }
        ]
    ),
    (28, 32) => (
        name_node,  // "User" node  
        Arc<Vec<RubyFqnPart>>[
            RubyFqnPart { /* Authentication module */ },
            RubyFqnPart {
                node_type: "Class", 
                node_name: "User",
                metadata: RubyFqnMetadata {
                    ast_node_kind: "class",
                    byte_range: (28, 32),
                }
            }
        ]
    ),
    // ... more entries
}

Definition Extraction

Definition extraction happens after FQN building and uses YAML rule matches to identify and classify definitions.

YAML Rules System

The definitions.yaml file defines patterns for each definition type:

rule:
  any:
    # Class definition
    - kind: class
      has:
        field: name
        kind: constant
        pattern: $CLASS_DEF_NAME
    
    # Lambda assignment
    - all:
        - kind: assignment
        - has:
            field: left
            any:
              - kind: constant
                pattern: $LAMBDA_CONSTANT_NAME
        - has:
            field: right
            kind: call
            has:
              field: method
              pattern: "lambda"
        - pattern: $LAMBDA_DEF

DefinitionExtractor Configuration

Each definition type has a dedicated extractor:

DefinitionExtractor {
    definition_type: DefinitionType::Class,
    extractor: |env| env.get("CLASS_DEF_NAME").map(|node| node.text.clone()),
    meta_vars: vec!["CLASS_DEF_NAME"],
}

DefinitionExtractor {
    definition_type: DefinitionType::Lambda,
    extractor: |env| {
        if env.get("LAMBDA_DEF").is_some() {
            // Extract assignment target name
            if let Some(constant_node) = env.get("LAMBDA_CONSTANT_NAME") {
                Some(constant_node.text.clone())
            } else if let Some(var_node) = env.get("LAMBDA_VARIABLE_NAME") {
                Some(var_node.text.clone())
            } else {
                Some("lambda".to_string())
            }
        } else {
            None
        }
    },
    meta_vars: vec!["LAMBDA_DEF", "LAMBDA_CONSTANT_NAME", "LAMBDA_VARIABLE_NAME", ...],
}

How to Add a New Definition Type

To add support for a new Ruby definition type, you need to make changes in 4 places:

1. Add to DefinitionType Enum (definitions.rs)

#[derive(Debug, Clone, PartialEq, Eq, Hash)]
pub enum DefinitionType {
    // ... existing types
    YourNewType,  // Add your new type here
}

2. Define Capture Variables (definitions.rs)

pub mod meta_vars {
    // ... existing vars
    pub const YOUR_NEW_TYPE_NAME: &str = "YOUR_NEW_TYPE_NAME";
    pub const YOUR_NEW_TYPE_EXTRA_INFO: &str = "YOUR_NEW_TYPE_EXTRA_INFO";
}

3. Add YAML Rule Pattern (rules/definitions.yaml)

rule:
  any:
    # ... existing rules
    
    # Your new definition type
    - all:
        - kind: your_ast_node_kind  # e.g., "assignment", "call", etc.
        - has:
            field: relevant_field
            pattern: $YOUR_NEW_TYPE_NAME
        - has:
            field: other_field
            pattern: $YOUR_NEW_TYPE_EXTRA_INFO

4. Add DefinitionExtractor (definitions.rs)

vec![
    // ... existing extractors
    
    DefinitionExtractor {
        definition_type: DefinitionType::YourNewType,
        extractor: |env| {
            env.get("YOUR_NEW_TYPE_NAME").map(|node| {
                // Extract the name from the matched AST node
                node.text.clone()
            })
        },
        meta_vars: vec!["YOUR_NEW_TYPE_NAME", "YOUR_NEW_TYPE_EXTRA_INFO"],
    },
]

5. Add FQN Support (if needed)

Update find_fqn_for_definition() to handle your new type:

let meta_var_name = match def_type {
    // ... existing cases
    DefinitionType::YourNewType => meta_vars::YOUR_NEW_TYPE_NAME,
};

6. Write Tests

Add tests in definitions.rs or any other relevant file:

#[test]
fn test_your_new_type_definitions() {
    test_definition_extraction(
        r#"
        # Your Ruby code example
        your_syntax here
        "#,
        vec![("expected_name", DefinitionType::YourNewType, "expected::fqn")],
        "Your new definition type description",
    );
}

Variable Captures

ast-grep stores the capture of each configured Pattern in definitions.yaml in the env field. Each definition captures relevant capture variables:

Definition Type Primary Env Vars Additional Env Vars
Class CLASS_DEF_NAME -
Module MODULE_DEF_NAME -
Method METHOD_DEF_NAME -
Singleton Method SINGLETON_METHOD_DEF_NAME -
Lambda LAMBDA_DEF LAMBDA_CONSTANT_NAME, LAMBDA_VARIABLE_NAME, LAMBDA_INSTANCE_VAR, LAMBDA_CLASS_VAR
Proc PROC_DEF, PROC_ASSIGNMENT PROC_CONSTANT_NAME, PROC_VARIABLE_NAME, PROC_INSTANCE_VAR, PROC_CLASS_VAR
Edited by Michael Angelo Rivera

Merge request reports

Loading