Skip to content

chore(indexing): remove remaining parquet writer hardcoding + fqn cleanup

What does this MR do and why?

The primary goal of this MR was to replace the primitive string representation of Fully Qualified Names (FQNs) with a structured, type-safe enum, and to make the writer portion of the relationship schema more declarative.

The central architectural change was the introduction of the FqnType enum and its integration into the core DefinitionNode struct. This change shifted how FQNs are stored and managed throughout the system.

Previous Model: Premature Stringification

  • Data Structure: In crates/indexer/src/analysis/types.rs, the DefinitionNode struct held the FQN as a String: pub fqn: String.
  • Process: Each language analyzer (e.g., for Ruby, Python, Java) was responsible for converting its language-specific FQN data structure into a generic String early in the analysis process. For example, ruby_fqn_to_string(&definition.fqn) was called in crates/indexer/src/analysis/languages/ruby/analyzer.rs.
  • Drawback: This approach discarded valuable structural information from the language-specific FQN, making it difficult to perform more advanced analysis later in the pipeline. All downstream consumers only had access to a flat string.

New Model: Preserving Structured FQN Data

  • Data Structure: The DefinitionNode.fqn field was changed from String to the new FqnType enum: pub fqn: FqnType. This new enum, defined in crates/indexer/src/analysis/types.rs, is a wrapper for language-specific FQN types:
    pub enum FqnType {
        Ruby(RubyFqn),
        Python(PythonFqn),
        // ... other languages
    }
  • Process: Language analyzers now construct an FqnType variant instead of a String. The conversion to a string representation is deferred and centralized via a std::fmt::Display implementation for FqnType. This trait calls the appropriate language-specific string conversion function on demand. The rich, structured FQN is preserved throughout the analysis and mutation pipeline. String conversion is now an explicit, final step, rather than an immediate, lossy transformation.

The shift to FqnType and a related refactoring of the relationship schema had several consequences across the codebase, simplifying logic and enforcing greater type consistency.

Changes - Language Analyzers

Every language analyzer was modified to align with the new data model. Instead of calling a language-specific fqn_to_string function, they now wrap the native FQN structure in the FqnType enum.

  • Affected Files:
    • crates/indexer/src/analysis/languages/csharp.rs
    • crates/indexer/src/analysis/languages/java/analyzer.rs
    • crates/indexer/src/analysis/languages/kotlin/analyzer.rs
    • crates/indexer/src/analysis/languages/python/analyzer.rs
    • crates/indexer/src/analysis/languages/ruby/analyzer.rs
    • crates/indexer/src/analysis/languages/rust.rs
    • crates/indexer/src/analysis/languages/typescript.rs
  • Example Change (from ruby/analyzer.rs):
    • Before: let fqn_string = ruby_fqn_to_string(&definition.fqn);
    • After: let fqn = FqnType::Ruby(definition.fqn.clone());

Changes - Explicit String Conversion in Downstream Consumers

Components that relied on a string representation of an FQN (e.g., for HashMap keys, logging, or comparisons) were updated to explicitly call .to_string() on the FqnType field. This makes the conversion visible and intentional.

  • Affected Components: ExpressionResolver implementations, mutation/changes.rs, and test suites.
  • Example Change (from tests.rs):
    • Before: .find(|def| def.fqn == "BaseModel")
    • After: .find(|def| def.fqn.to_string() == "BaseModel")

Changes - Refactoring of the Relationship Schema

In parallel with the FQN changes, the definition of relationships between nodes was made more declarative and centralized.

  • RelationshipKind Enum Movement: The RelationshipKind enum, which defines the type of a graph edge (e.g., FileToDefinition), was moved from crates/indexer/src/analysis/types.rs to the schema definition crate at crates/database/src/schema/types.rs. This co-locates the relationship type with other core schema definitions.

  • Schema Declaration: The RelationshipTable struct in crates/database/src/schema/types.rs was modified. Its from_to_pairs field was changed from (&'static NodeTable, &'static NodeTable) to (&'static NodeTable, &'static NodeTable, Option<&'static RelationshipKind>). This embeds the relationship kind directly into the schema definition in crates/database/src/schema/init.rs.

  • Logic Simplification in WriterService: The get_relationships_for_pair function in crates/indexer/src/analysis/types.rs was deleted. This function contained a large, hardcoded match statement to determine which relationships to process for a given pair of node tables. The WriterService (crates/indexer/src/writer.rs) now iterates directly over the from_to_pairs in the schema, using the provided RelationshipKind to filter and write the correct set of relationships. This makes the logic data-driven and removes the need for procedural mapping.

Related Issues

#175

Testing

All existing unit and integration tests pass.

Performance Analysis

  • This merge request does not introduce any performance regression. If a performance regression is expected, explain why.
Edited by Michael Usachenko

Merge request reports

Loading