[gkg] Graph Extractor Language - POC

Problem to Solve

We need a production-quality proof of concept that validates the end-to-end GEL pipeline on a real framework (Next.js) and exposes the resulting custom graph entities across our surfaces. The POC should demonstrate that:

  • Authoring extractor rules in .gitlab/gel/*.toml generates stable CustomNode and CustomRelationship outputs without core code changes.
  • Indexing the sample project produces Parquet artifacts that can be ingested automatically by the schema manager and queried with the existing API surfaces.
  • Frontend and MCP experiences can render, search, and filter the new node type (API_ENDPOINT) with the expected metadata (HTTP method, route path, file location) and relationship (ENDPOINT_DEFINED_BY).
  • Performance and regression risks (e.g., line number accuracy, UI fallbacks) are understood early.

Proposed Solution

Deliver a POC comprising the following workstreams (already in-flight in this MR series):

  1. Extractor engine

    • Implement crates/indexer/src/analysis/extractors with rule parsing (model.rs) and execution (runner.rs), covering glob/regex matching, template rendering, and relationship creation.
    • Normalize TypeScript definitions so exported route handlers resolve to Function definitions (ensures GEL rules match GET/POST symbols).
    • Extend GraphData / NodeIdGenerator / WriterService to capture and persist custom nodes & relationships to Parquet.
  2. Database ingestion

    • Update the schema manager to detect custom_nodes_*.parquet and custom_relationships_*.parquet, create sanitized Kùzu tables, and bulk import rows during project load.
    • Expose helper queries (get_custom_nodes_query, get_custom_neighbors_query, get_search_custom_nodes_query) that deliver enriched graph rows with consistent field ordering (including end_line).
  3. Backend APIs & tests

    • Stitch custom rows into the graph initial/neighbors/search endpoints with duplicate guards and CUSTOM relationship type mapping.
    • Add e2e tests using the bundled Next.js fixture to assert that indexing returns custom nodes, neighbors expose ENDPOINT_DEFINED_BY, and search surfaces the nodes.
  4. Product surfaces

    • Refresh Explorer legend, tooltips, node cards, and search results to display custom node metadata, colored badges, and byte/line ranges.
    • Introduce a sidebar index of API endpoints with filtering to mirror the transcript’s demo flow.
    • Add list_api_endpoints MCP tool to query custom nodes, hydrate code snippets, and filter by method/route/import usage.
  5. Sample configuration & docs placeholder

    • Provide .gitlab/gel/nextjs-routes.toml and fixture projects illustrating authoring patterns.
    • Capture learnings for eventual public documentation (gap analysis, open questions, telemetry needs).
Edited by Michael Angelo Rivera