Declarative DSL engine for tree-sitter parsing
## Problem to Solve
Each language's parser is an imperative tree-sitter walker written from scratch. The 7 current parsers total ~30,800 lines. Despite structural similarity (walk the AST, extract definitions at scope boundaries, extract references at call sites, extract imports), each one reimplements the same walk with different variable names:
```
Python parser: 6,715 lines (references.rs alone: 2,076 lines)
Ruby parser: 5,560 lines (definitions.rs: 1,111 lines)
TypeScript parser: 5,594 lines (4 SWC sub-modules)
Kotlin parser: 3,922 lines (ast.rs: 3,100 lines)
Rust parser: 4,161 lines (fqn.rs: 1,592 lines)
Java parser: 3,143 lines (ast.rs: 2,484 lines)
C# parser: 1,705 lines
```
The actual language-specific logic in each parser is small — which tree-sitter node kinds are scopes, which are references, how to extract import paths. The rest is tree-walking boilerplate.
## Proposed Solution
A declarative DSL where each language is a table of rules. The DSL engine walks the tree-sitter AST once and applies matching rules to produce a `CanonicalResult` directly.
```rust
impl DslLanguage for PythonDsl {
fn scopes() -> Vec<ScopeRule> {
vec![
scope("class_definition", "Class").def_kind(DefKind::Class)
.metadata(metadata().super_types(ExtractList::Fn(python_super_types))),
scope_fn("function_definition", classify_fn).def_kind(DefKind::Function)
.metadata(metadata().decorators(ExtractList::Fn(python_decorators))),
]
}
fn refs() -> Vec<ReferenceRule> {
vec![
reference("call")
.when(field_kind("function", &["attribute"]))
.name_from(Extract::FieldChain(&["function", "attribute"])),
reference("call").name_from(field("function")),
]
}
fn imports() -> Vec<ImportRule> { /* ... */ }
fn bindings() -> Vec<ParseBindingRule> { /* ... */ }
}
```
The Python spec is ~100 lines. The entire DSL engine is ~1,200 lines. Together they replace a 6,715-line hand-written parser.
**DSL primitives:**
| Concept | Builder | Purpose |
|---|---|---|
| `ScopeRule` | `.when()`, `.name_from()`, `.def_kind()`, `.metadata()`, `.no_scope()` | Match tree-sitter node kinds to definitions |
| `ReferenceRule` | `.when()`, `.name_from()` | Match call sites / usages |
| `ImportRule` | `.classify()`, `.multi()`, `.alias_child()`, `.split_last()`, `.path_from()` | Handle import syntax variations |
| `ParseBindingRule` | `.name_from()`, `.value_from()`, `.no_value()` | Extract variable assignments for SSA |
| `Extract` | `Default`, `None`, `Field`, `ChildOfKind`, `FieldChain`, `Declarator` | Pull text from a single node |
| `ExtractList` | `ChildrenOfField`, `ChildrenOfKind`, `FieldSplit`, `Decorators`, `Fn` | Pull text from multiple nodes |
| `Pred` | `parent_is()`, `field_kind()`, `ancestor_is()`, `has_name()` | Conditional rule application |
The DSL handles the hard parts of import parsing that differ across languages: Python's `from X import a, b, c` (`.multi()`), Java's `import com.example.Foo` (`.split_last()`), Python's `import X as Y` (`.alias_child()`), and wildcard vs. explicit classification (`.classify()`).
## Acceptance Criteria
- [ ] `LanguageSpec` struct with `ScopeRule`, `ReferenceRule`, `ImportRule`, `ParseBindingRule`
- [ ] `Extract` and `ExtractList` enums for flexible node text extraction
- [ ] `Pred` predicate system for conditional rules
- [ ] `DslParser<L: DslLanguage>` implementing `CanonicalParser`
- [ ] `LanguageSpec.package()` for namespace/package scope pushing
- [ ] `LanguageSpec.custom_import()` for complex import handling (e.g., Python `__future__`)
- [ ] DSL specs for Python (~100 lines), Java (~110 lines), Kotlin (~130 lines), C# (~80 lines)
- [ ] Parser output matches V1 for supported constructs (verified by graph validator suites)
issue