Draft: docs(indexing): update code indexing design document with as-built architecture
Summary
Updates docs/design-documents/indexing/code_indexing.md with accurate implementation details for the code indexing pipeline, replacing stale references to WriterService, Parquet files, and the external gitlab-code-parser dependency.
Changes
Core components — updated to reference actual crate names (code-parser, code-graph, indexer, gitaly-client) with cross-reference to sdlc_indexing.md for shared infrastructure.
Transform section — replaced stale overview with:
- Parser architecture: 3 backends (
ruby_prism, SWC, tree-sitterGenericParser) - Language support matrix (7 languages, 12 extensions)
- Extraction types (definitions, imported symbols, references)
- Streaming pipeline (
DirectoryFileSource→GraphData) - Concurrency model (Rayon semaphore, IO
buffer_unordered, mpsc channel) - Graph data model (4 node types)
- Relationship catalog (49 fine-grained types → 4 ontology labels)
Load section — replaced WriterService/Parquet description with:
- ETL engine and module system (
Handler,Module,Engine,WorkerPool) - Pluggable storage (
Destination/BatchWritertraits) - Full 15-step push event flow
- Arrow conversion with base columns and 5 ClickHouse table schemas
Flow diagram — replaced outdated mermaid diagram with accurate text-based pipeline showing the 9 actual steps.
Added — "Differences from the original local tool" comparison table.
Closes #95
Edited by Adam Mulvany