[gkg] Graph Extractor Language - POC
Problem to Solve
We need a production-quality proof of concept that validates the end-to-end GEL pipeline on a real framework (Next.js) and exposes the resulting custom graph entities across our surfaces. The POC should demonstrate that:
- Authoring extractor rules in
.gitlab/gel/*.tomlgenerates stableCustomNodeandCustomRelationshipoutputs without core code changes. - Indexing the sample project produces Parquet artifacts that can be ingested automatically by the schema manager and queried with the existing API surfaces.
- Frontend and MCP experiences can render, search, and filter the new node type (
API_ENDPOINT) with the expected metadata (HTTP method, route path, file location) and relationship (ENDPOINT_DEFINED_BY). - Performance and regression risks (e.g., line number accuracy, UI fallbacks) are understood early.
Proposed Solution
Deliver a POC comprising the following workstreams (already in-flight in this MR series):
-
Extractor engine
- Implement
crates/indexer/src/analysis/extractorswith rule parsing (model.rs) and execution (runner.rs), covering glob/regex matching, template rendering, and relationship creation. - Normalize TypeScript definitions so exported route handlers resolve to
Functiondefinitions (ensures GEL rules matchGET/POSTsymbols). - Extend
GraphData/NodeIdGenerator/WriterServiceto capture and persist custom nodes & relationships to Parquet.
- Implement
-
Database ingestion
- Update the schema manager to detect
custom_nodes_*.parquetandcustom_relationships_*.parquet, create sanitized Kùzu tables, and bulk import rows during project load. - Expose helper queries (
get_custom_nodes_query,get_custom_neighbors_query,get_search_custom_nodes_query) that deliver enriched graph rows with consistent field ordering (includingend_line).
- Update the schema manager to detect
-
Backend APIs & tests
- Stitch custom rows into the graph initial/neighbors/search endpoints with duplicate guards and
CUSTOMrelationship type mapping. - Add e2e tests using the bundled Next.js fixture to assert that indexing returns custom nodes, neighbors expose
ENDPOINT_DEFINED_BY, and search surfaces the nodes.
- Stitch custom rows into the graph initial/neighbors/search endpoints with duplicate guards and
-
Product surfaces
- Refresh Explorer legend, tooltips, node cards, and search results to display custom node metadata, colored badges, and byte/line ranges.
- Introduce a sidebar index of API endpoints with filtering to mirror the transcript’s demo flow.
- Add
list_api_endpointsMCP tool to query custom nodes, hydrate code snippets, and filter by method/route/import usage.
-
Sample configuration & docs placeholder
- Provide
.gitlab/gel/nextjs-routes.tomland fixture projects illustrating authoring patterns. - Capture learnings for eventual public documentation (gap analysis, open questions, telemetry needs).
- Provide
Edited by Michael Angelo Rivera