feat(cli): persist graph index to DuckDB

What does this MR do and why?

Part of #324 (closed). We now support indexing code graphs locally and querying them, just like you would in the server (via the JSON DSL) but from the terminal.

  orbit index ./repo
  ─────────────────

  Tree-sitter          Ontology-driven         DuckDB
  (7 languages)        converter               (~/.orbit/graph.duckdb)

  repo/                AsRecordBatch           ┌────────────────┐
   ├── src/       ──>  per entity type    ──>  │ gl_file        │
   ├── lib/            + Appender API          │ gl_definition  │
   └── ...             (bulk insert)           │ gl_directory   │
                                               │ gl_imported_.. │
                                               │ gl_edge        │
                                               └───────┬────────┘
  orbit query '<json>'                                 │
  ────────────────────                                 │
                                                       v
  Compile ──> Execute ──> Hydrate ──> Resolve content
  (DuckDB     (parameterized          (read file bytes
   dialect)    $1,$2,...)              from disk)

Multiple repos share the same DuckDB file, scoped by a deterministic project_id derived from the repo's canonical path. Re-indexing deletes the old project data first.


Usage

# Index a repo
orbit index /path/to/repo

# Search with file content resolved from disk
orbit query '{"query_type":"search","node":{"id":"f","entity":"File","columns":["id","name","path","content"]},"limit":5}'

# Traversal with byte-range sliced definition content
orbit query '{"query_type":"traversal","nodes":[{"id":"f","entity":"File","columns":["id","path"]},{"id":"d","entity":"Definition","columns":["id","name","content"]}],"relationships":[{"type":"DEFINES","from":"f","to":"d"}],"limit":3}'

# Compile to SQL without executing
orbit compile --local '{"query_type":"search","node":{"id":"f","entity":"File","columns":"*"},"limit":10}'

# Raw JSON output
orbit query --raw '{"query_type":"search","node":{"id":"f","entity":"File","columns":["id","name"]},"limit":3}'
Edited by Michael Usachenko

Merge request reports

Loading