Draft: docs(indexing): update code indexing design document with as-built architecture

Summary

Updates docs/design-documents/indexing/code_indexing.md with accurate implementation details for the code indexing pipeline, replacing stale references to WriterService, Parquet files, and the external gitlab-code-parser dependency.

Changes

Core components — updated to reference actual crate names (code-parser, code-graph, indexer, gitaly-client) with cross-reference to sdlc_indexing.md for shared infrastructure.

Transform section — replaced stale overview with:

  • Parser architecture: 3 backends (ruby_prism, SWC, tree-sitter GenericParser)
  • Language support matrix (7 languages, 12 extensions)
  • Extraction types (definitions, imported symbols, references)
  • Streaming pipeline (DirectoryFileSourceGraphData)
  • Concurrency model (Rayon semaphore, IO buffer_unordered, mpsc channel)
  • Graph data model (4 node types)
  • Relationship catalog (49 fine-grained types → 4 ontology labels)

Load section — replaced WriterService/Parquet description with:

  • ETL engine and module system (Handler, Module, Engine, WorkerPool)
  • Pluggable storage (Destination/BatchWriter traits)
  • Full 15-step push event flow
  • Arrow conversion with base columns and 5 ClickHouse table schemas

Flow diagram — replaced outdated mermaid diagram with accurate text-based pipeline showing the 9 actual steps.

Added — "Differences from the original local tool" comparison table.

Closes #95

Edited by Adam Mulvany

Merge request reports

Loading