Code Indexer v2: Flexible pipeline for code graph construction
## Problem to Solve
The current code graph pipeline has a per-language architecture that does not scale. Each of the 7 supported languages requires:
- Its own **parser** (~2,000–5,000 lines of imperative tree-sitter walking)
- Its own **type enums** (~20 variants per language, wrapped 3 times through the pipeline)
- Its own **resolver** (~300–2,200 lines of bespoke reference resolution)
- Its own **linker integration** (~15 files touched to add a new language)
The result is ~30,000 lines of parser code, ~10,000 lines of resolver code, and 145 definition/import type variants across 14 enums — with significant structural duplication between languages. The three-layer type wrapping (parser enums → linker wrapper enums → processor dispatch enums) exists only to carry information that collapses to a `&str` at the Arrow serialization boundary.
```
PythonDefinitionType (11 variants) ──┐
JavaDefinitionType (14 variants) │ Parser layer: 95 variants across 7 enums
RubyDefinitionType (6 variants) │
KotlinDefinitionType (16 variants) │
... ──┘
↓ wrapped into
DefinitionType::Python(PythonDefinitionType::Class) ← Linker layer
↓ dispatched via
Definitions::Python(Vec<...>) ← Processor layer
↓ serialized as
"class" ← Arrow output
```
## Proposed Solution
Replace the per-language architecture with a generic, language-agnostic pipeline. Four components:
### 1. Pipeline framework with canonical types
One set of canonical types flows through the entire pipeline. `DefKind` (10 variants) + `&'static str` replaces 95 enum variants. A `LanguagePipeline` trait allows both generic (DSL parser + SSA resolver) and custom (full control) strategies behind one interface. One line in a macro registers a language.
### 2. Declarative DSL engine
A declarative DSL where each language is ~80–130 lines of rule tables instead of ~2,000–5,000 lines of imperative walkers. The DSL engine handles the tree-sitter walk; language specs just declare which node kinds are scopes, references, imports, and bindings.
### 3. SSA-based generic resolver
One generic resolver for all languages, based on the Braun et al. SSA construction algorithm. Per-language differences (import strategies, receiver conventions, chain modes) are declarative rule tables.
### 4. YAML/Cypher test framework
End-to-end correctness validation. Self-contained YAML test suites with inline source fixtures, queried with Cypher against the resulting graph. Catches regressions that unit tests miss.
---
### Background and prior art
This work builds on several earlier explorations:
- knowledge-graph!764 — *spike: declarative DSL engine for near-instant language support*. Proved that C and C++ could be added with ~30 lines of config. Python definition extraction replicated to show parity.
- knowledge-graph!766 — *feat(parser): declarative DSL engine with C/C++ language support*. Clean iteration of the DSL, with C++ composing from C by inheriting rules.
- knowledge-graph!767 — *feat(linker): global backtracking for language-agnostic reference resolution*. First pass at generic resolution — global name matching with local-first preference and ambiguity tracking.
- knowledge-graph!885 — *spike: canonical IR types and unified Language config*. Introduced `code-graph-types` crate, `DefKind` enum, `ToCanonical` trait, and the `register_languages!` macro.
These were themselves inspired by:
- [gitlab-code-parser#38](https://gitlab.com/gitlab-org/rust/gitlab-code-parser/-/work_items/38) — original proposal for declarative language support
- [knowledge-graph#3](https://gitlab.com/gitlab-org/rust/knowledge-graph/-/work_items/3) — codescope prototype with global backtracking resolver
The SSA-based resolver is an application of:
- Braun, M., Buchwald, S., Hack, S., Leißa, R., Mallon, C., Zwinkau, A. (2013). [Simple and Efficient Construction of Static Single Assignment Form](https://dl.acm.org/doi/10.1007/978-3-642-37051-9_6). *Compiler Construction (CC 2013)*, LNCS vol 7791. — On-the-fly SSA construction without pre-computed CFG or dominance frontiers. Three operations (`write_variable`, `read_variable`, `seal_block`) with lazy phi insertion and trivial phi elimination.
epic