GitLab Knowledge Graph - Graph Engine
> [!important] > > This epic follows the already discussed and approved design document https://gitlab.com/gitlab-com/content-sites/handbook/-/merge_requests/16424. The document is a must read before continuing reading. ## Summary Implement the ClickHouse Graph Query Engine to provide property-graph semantics (paths, traversals, pattern matching) over GitLab's indexed namespace data. This epic covers the core query tier components: ClickHouse storage model design, SQL query generation and translation, multi-layered authorization enforcement, and the unified security and performance testing framework. The system compiles an intermediate representation of graph queries (JSON DSL) into parameterized ClickHouse SQL and executes them directly on adjacency-ordered edge tables and typed node tables, enabling cross-entity queries without introducing another datastore. This epic does not deal with the following GKGaaS deliverables and focuses on delivering and testing the Graph Engine as a black-box: * SDLC Graph indexing pipeline (`gkg-indexer`) and ETL transformation * ClickHouse infrastructure and cluster deployment * Siphon CDC configuration and PostgreSQL replication setup * Observability infrastructure and MCP/REST layers --- ## Architecture overview ```mermaid flowchart LR Client[MCP/REST] -->|JSON query + JWT| Rails Rails -->|JSON + SecurityContext| Engine[Query Engine] Engine -->|Parameterized SQL\n+ RedactionContext| Server[gkg-server] Server -->|SQL + params| CH[(ClickHouse)] CH -->|rows| Server Server -->|entity IDs| Rails Rails -->|authorized/denied| Server Server -->|redacted results| Rails Rails -->|JSON response| Client ``` ### Compiler pipeline ```mermaid flowchart TD Input["JSON input (untrusted)"] --> Schema[schema validation] subgraph Gate ["validation gate -- reject bad queries early"] Schema --> Ontology[ontology validation] Ontology --> Parse[parse into typed structs] Parse --> Validate[semantic validation:\ncross-node references] end Validate -->|invalid| Reject[reject] Validate -->|valid| Normalize subgraph Trusted ["trusted zone -- only validated input reaches here"] Normalize[normalize:\nentity names to tables,\nenum coercion, wildcard expansion] Normalize --> Lower["lower:\nbuild SQL AST per query type\n(traversal/search/agg/path/neighbors)"] Lower --> Return["enforce return:\ninject _gkg_*_id, _gkg_*_type\ncolumns for redaction"] Return --> Security["inject security:\nstartsWith(traversal_path, ?)\nper table scan"] Security --> Codegen[codegen:\nAST to parameterized SQL] end Codegen --> Output["ParameterizedQuery\n(SQL + params + RedactionContext)"] ``` --- ## Task Descriptions and Status ### Ontology foundation The query engine and ETL pipeline both need to agree on what entities exist, what properties they have, and how they relate to each other. Without a shared schema, the indexer writes columns the engine doesn't know about, or the engine tries to query tables that don't exist yet. The ontology crate (`crates/ontology`) solves this. It loads YAML definitions for every entity type (User, Project, MergeRequest, Pipeline, etc.) and every relationship (AUTHORED, IN_PROJECT, CONTAINS, etc.) and makes them available to both sides. The query engine uses the ontology at compile time to validate queries, populate JSON Schema `enum` fields at runtime, map entity names to ClickHouse table names, and determine which columns exist on which tables. The JSON Schema that agents use to build valid queries is derived directly from these YAML files, so the schema stays in sync with the database automatically. We also moved away from a single generic `nodes` table early on. Each entity type now has its own ClickHouse table (`gl_user`, `gl_merge_request`, `gl_project`, etc.) with type-specific columns and ordering. This was necessary for the security model because different entity types need different `traversal_path` filtering behavior: Users don't have a traversal path (they exist across groups), while Projects and MRs do. | Status | Work item | MRs | |--------|-----------|-----| | [x] | Shared ontology crate (`crates/ontology`) for ETL and graph engine | [!51](https://gitlab.com/gitlab-org/orbit/knowledge-graph/-/merge_requests/51) | | [x] | SDLC domain ontologies: MR, Plan, CI, Security, Code | [!53](https://gitlab.com/gitlab-org/orbit/knowledge-graph/-/merge_requests/53), [!54](https://gitlab.com/gitlab-org/orbit/knowledge-graph/-/merge_requests/54), [!55](https://gitlab.com/gitlab-org/orbit/knowledge-graph/-/merge_requests/55), [!56](https://gitlab.com/gitlab-org/orbit/knowledge-graph/-/merge_requests/56), [!57](https://gitlab.com/gitlab-org/orbit/knowledge-graph/-/merge_requests/57) | | [x] | Flatten ontology nodes, simplify edge format | [!88](https://gitlab.com/gitlab-org/orbit/knowledge-graph/-/merge_requests/88), [!90](https://gitlab.com/gitlab-org/orbit/knowledge-graph/-/merge_requests/90) | | [x] | Stricter node validation, JSON Schema cleanup | [!91](https://gitlab.com/gitlab-org/orbit/knowledge-graph/-/merge_requests/91) | | [x] | Consolidate schema codepaths into ontology crate | [!60](https://gitlab.com/gitlab-org/orbit/knowledge-graph/-/merge_requests/60) | | [x] | Deprecate generic `nodes` table, queries target entity tables directly | [!47](https://gitlab.com/gitlab-org/orbit/knowledge-graph/-/merge_requests/47) | | [x] | Make `traversal_path` an explicit ontology property for all nodes except User | [!154](https://gitlab.com/gitlab-org/orbit/knowledge-graph/-/merge_requests/154) | | [x] | Ontology-derived graph generation for simulator | [!62](https://gitlab.com/gitlab-org/orbit/knowledge-graph/-/merge_requests/62) | | [x] | Support ontology `EnumType` in normalize phase | [!194](https://gitlab.com/gitlab-org/orbit/knowledge-graph/-/merge_requests/194) | | [x] | Decouple ontology from lowering phase | [!214](https://gitlab.com/gitlab-org/orbit/knowledge-graph/-/merge_requests/214) | ### Query engine compiler This is the core of the graph engine: a compiler that turns JSON DSL queries into parameterized ClickHouse SQL. The pipeline has eight phases (schema validation, ontology validation, parse, semantic validation, normalize, lower, enforce return context, security injection, codegen) and supports five query types: - **Traversal**: multi-hop graph traversal across entity types with JOINs through the edge table. "Give me all MRs authored by User X in Project Y." - **Search**: single-entity queries with filters. "Find all open issues with label 'security'." - **Aggregation**: COUNT/SUM/AVG/MIN/MAX with GROUP BY. "Count MRs per user in this project." - **Path-finding**: recursive CTE-based shortest path between two nodes. "Find the shortest path from User A to Project B." - **Neighbors**: bidirectional edge scan from a starting node. "What is connected to this MR?" The lowering phase is where most of the complexity lives. Each query type has its own lowering strategy that produces a different SQL shape: traversals become JOIN chains, aggregations add GROUP BY, path-finding generates recursive CTEs with cycle detection, and neighbors do a bidirectional UNION ALL scan of the edge table. The codegen phase walks the AST and emits parameterized ClickHouse SQL. Every user-provided value becomes a `{p0:Type}` placeholder, never interpolated into the SQL string. | Status | Work item | MRs | |--------|-----------|-----| | [x] | Input parsing + validation phases | [!92](https://gitlab.com/gitlab-org/orbit/knowledge-graph/-/merge_requests/92) | | [x] | AST lowering phase + security policy injection | [!96](https://gitlab.com/gitlab-org/orbit/knowledge-graph/-/merge_requests/96) | | [x] | SQL codegen with parameterized output | [!103](https://gitlab.com/gitlab-org/orbit/knowledge-graph/-/merge_requests/103) | | [x] | Column alignment: `source_id`/`target_id` edge columns | [!125](https://gitlab.com/gitlab-org/orbit/knowledge-graph/-/merge_requests/125) | | [x] | Migrate from `kg_` prefix to `gl_` | [!121](https://gitlab.com/gitlab-org/orbit/knowledge-graph/-/merge_requests/121) | | [x] | Fix recursive CTEs | [!123](https://gitlab.com/gitlab-org/orbit/knowledge-graph/-/merge_requests/123) | | [x] | Multi-hop traversals with clean lowering | [!128](https://gitlab.com/gitlab-org/orbit/knowledge-graph/-/merge_requests/128) | | [x] | Search query functionality | [!153](https://gitlab.com/gitlab-org/orbit/knowledge-graph/-/merge_requests/153) | | [x] | Column selection in queries (explicit + wildcard) | [!155](https://gitlab.com/gitlab-org/orbit/knowledge-graph/-/merge_requests/155) | | [x] | Neighbors query type | [!158](https://gitlab.com/gitlab-org/orbit/knowledge-graph/-/merge_requests/158) | | [x] | Return edges alongside nodes in query results | [!170](https://gitlab.com/gitlab-org/orbit/knowledge-graph/-/merge_requests/170) | | [x] | Wildcard edges + multiple edge types in multi-hop | [!177](https://gitlab.com/gitlab-org/orbit/knowledge-graph/-/merge_requests/177) | | [x] | Fix aggregated entity columns not returned | [!213](https://gitlab.com/gitlab-org/orbit/knowledge-graph/-/merge_requests/213) | | [x] | Fix neighbors empty results for User/Group sources | [!202](https://gitlab.com/gitlab-org/orbit/knowledge-graph/-/merge_requests/202) | ### Query normalization + Validation Validation happens in two places. JSON Schema validation catches structural problems (wrong types, missing fields, unknown entity names) and the ontology populates the schema's `enum` fields at runtime so it also catches invalid entity types, columns, and relationship names. Semantic validation then handles the things JSON Schema can't express: does the `from` node in a relationship actually reference a declared node ID? Does the `order_by` target a node that exists in the query? Normalization runs after validation and before lowering. It maps human-readable entity names to ClickHouse table names (`User` -> `gl_user`), coerces enum integer values to their string labels (e.g. MR state `1` -> `"opened"`), and expands wildcard column selections (`*`) into the full column list for that entity. This keeps the lowering phase clean: it only has to deal with canonical, table-ready inputs. | Status | Work item | MRs | |--------|-----------|-----| | [x] | Normalize phase: entity-to-table name mapping, enum coercion, wildcard expansion | [!191](https://gitlab.com/gitlab-org/orbit/knowledge-graph/-/merge_requests/191) | | [x] | Input + validate phases with JSON Schema | [!92](https://gitlab.com/gitlab-org/orbit/knowledge-graph/-/merge_requests/92) | ### Security and redaction The query engine handles Layers 1 and 2 of the authorization model. Layer 2, `traversal_path` filtering, is the main thing the engine owns: it walks the SQL AST after lowering and injects `startsWith(traversal_path, ?)` predicates into the WHERE clause of every table scan (except edge tables and entities without a traversal path, like Users). This scopes results to the GitLab namespace hierarchy the requesting user has access to. The tricky part is redaction context. After a query runs, the gkg-server needs to know which entities appeared in each row so it can check Rails permissions (Layer 3). For traversal and search queries this is straightforward: we inject hidden `_gkg_{alias}_id` and `_gkg_{alias}_type` columns into the SELECT at compile time. But path-finding and neighbors queries discover entity types at runtime (you don't know what's on the other end of a path or in the middle of a path until the query runs), so those use dynamic node discovery via `_gkg_path` and `_gkg_neighbor_type` columns. We also unrolled path-finding from a single recursive CTE into a chain of bounded CTEs. The recursive approach was harder to reason about for security (where exactly do you inject the `traversal_path` filter in a recursive CTE?), and the unrolled version is both clearer and faster in ClickHouse. | Status | Work item | MRs | |--------|-----------|-----| | [x] | `traversal_path`-based tenant isolation via AST injection | [!96](https://gitlab.com/gitlab-org/orbit/knowledge-graph/-/merge_requests/96) | | [x] | Redaction context for path-finding queries (dynamic node discovery) | [!144](https://gitlab.com/gitlab-org/orbit/knowledge-graph/-/merge_requests/144) | | [x] | Unroll path-finding recursive CTEs into CTE chains (hardened against bypass) | [!168](https://gitlab.com/gitlab-org/orbit/knowledge-graph/-/merge_requests/168) | | [x] | Query engine hardening pass | [!42](https://gitlab.com/gitlab-org/orbit/knowledge-graph/-/merge_requests/42) | ### Performance and path-finding Path-finding queries were the most expensive operations in simulator benchmarks. Two changes brought them into acceptable range: removing hash join hints that ClickHouse was using suboptimally, and restructuring the CTE chain to reduce intermediate result sizes. The median query time across all 29 test queries is 26ms; the slowest (a high-weight work items aggregation) is 1,389ms. | Status | Work item | MRs | |--------|-----------|-----| | [x] | Remove hash joins from path-finding queries | [!179](https://gitlab.com/gitlab-org/orbit/knowledge-graph/-/merge_requests/179) | | [x] | Path-finding performance optimization | [!181](https://gitlab.com/gitlab-org/orbit/knowledge-graph/-/merge_requests/181) | ### Simulator and testing framework We can't wait for production data to find out if queries are correct, fast enough, or break under load. The simulator generates synthetic graphs that mirror the shape of real GitLab data (group hierarchies, project memberships, MR authorship, pipeline runs) and runs the full query suite against them in a local ClickHouse instance. The generation is ontology-driven: it reads the same YAML definitions the query engine uses and builds a graph that respects the entity types, relationship cardinalities, and `traversal_path` hierarchies. The YAML config controls how many of each entity to create and how many edges to generate per relationship type. At full scale it produces 11M nodes and 100M+ edges, but can be configured to generate a larger or smaller graph, with variable density. The evaluation phase compiles every query in `fixtures/queries/`, runs it against ClickHouse, and records execution time, row count, query plan, and any errors. This is how we caught the three memory-limit failures and the path-finding performance regressions before they hit production. | Status | Work item | MRs | |--------|-----------|-----| | [x] | YAML-based config, split phases into generate/load/evaluate | [!85](https://gitlab.com/gitlab-org/orbit/knowledge-graph/-/merge_requests/85) | | [x] | Auth hierarchy-aware data generation | [!77](https://gitlab.com/gitlab-org/orbit/knowledge-graph/-/merge_requests/77) | | [x] | `traversal_id` support + writer speedup | [!65](https://gitlab.com/gitlab-org/orbit/knowledge-graph/-/merge_requests/65) | | [x] | Evaluation phase | [!106](https://gitlab.com/gitlab-org/orbit/knowledge-graph/-/merge_requests/106) | | [x] | PRNG seeding + User table traversal bug fixes | [!115](https://gitlab.com/gitlab-org/orbit/knowledge-graph/-/merge_requests/115) | | [x] | Edge generation bug fixes | [!119](https://gitlab.com/gitlab-org/orbit/knowledge-graph/-/merge_requests/119) | | [x] | Simulator cleanup | [!141](https://gitlab.com/gitlab-org/orbit/knowledge-graph/-/merge_requests/141) | | [x] | Scale synthetic data generation to 100M+ edges | [!185](https://gitlab.com/gitlab-org/orbit/knowledge-graph/-/merge_requests/185) | | [x] | Capture results and query plans for all test queries | [!204](https://gitlab.com/gitlab-org/orbit/knowledge-graph/-/merge_requests/204) | ### DevEx and tooling Standard project setup: mise task runner, MR/issue templates, and a CLI that lets you pipe JSON queries through the compiler locally and see the generated SQL without needing a running ClickHouse instance. Useful for debugging and for the Claude skill that generates queries. | Status | Work item | MRs | |--------|-----------|-----| | [x] | CLI access to graph engine compiler + skill | [!112](https://gitlab.com/gitlab-org/orbit/knowledge-graph/-/merge_requests/112) | --- ## Status: in progress ### Unify validation phase ([!242](https://gitlab.com/gitlab-org/orbit/knowledge-graph/-/merge_requests/242)) Validation was split across two layers doing the same work. The JSON Schema checked query structure, then `validate.rs` re-checked entity types, relationship types, columns, filters, and hop ranges in Rust. This MR pushes everything the schema can handle into the schema and deletes the duplicate Rust code. `PropertyFilter` went from 11 `oneOf` variants to a single `op` enum with `if/then` conditionals. The only Rust-side validation left (`check_references`) handles cross-node ID references that JSON Schema structurally cannot express: relationship from/to, aggregation targets, order_by node, path endpoints. After this, the JSON Schema (`schema.json`) is the single source of truth for query shape validation. ### Aggregation redaction context ([!174](https://gitlab.com/gitlab-org/orbit/knowledge-graph/-/merge_requests/174)) Aggregation queries had a gap in the redaction model. When counting MRs per user, the engine checked whether the user (the `group_by` node) was authorized, but not the individual MRs being counted. If any aggregated entity was unauthorized, the count would include it anyway. This MR adds `groupUniqArray(target.id)` collection for all aggregation targets. The gkg-server can now verify authorization for every entity included in an aggregate. If any ID in the array is unauthorized, the entire row is redacted. Fail-closed. --- ## Status: remaining work ### Query validation and schema The JSON Schema currently handles structure and ontology validation. What's missing is a "super schema" that also encodes the security constraints (max hops, row limits), relationship cardinalities, and any per-agent restrictions. Right now those limits live in scattered constants across the Rust code. - [ ] Extend `schema.json` to encode security constraints and validation rules alongside ontology data. One schema to rule them all. - [ ] Derive per-agent subschemas: compact, focused schemas that fit in an LLM context window and only expose the entities and relationships the agent can access. An agent that only works with CI data shouldn't see the full schema. - [ ] Schema versioning: track schema changes over time and validate incoming queries against the schema version that was deployed when the query was generated. Prevents breakage during rollouts. ### Authorization enforcement Layer 2 (`traversal_path` filtering) is implemented. Layers 1 and 3 are not. - [ ] Derive the security skip-table list from ontology metadata. Right now `gl_users` is hardcoded in `security.rs` as a table that skips `traversal_path` filtering. Any entity without a `traversal_path` property in the ontology should be skipped automatically. - [ ] Layer 1: `organization_id` injection. Every query needs an automatic `WHERE organization_id = ?` predicate from the JWT. Cross-organization queries should be blocked outright. - [ ] Layer 3: Rails redaction callbacks. After the query runs and the engine returns results with redaction context, the gkg-server needs to batch-check permissions via the Rails authorization endpoint and strip unauthorized rows before returning them. This is partially implemented in the server but not fully wired up. - [ ] Query safeguards at the execution level. Schema-level limits (max 3 hops, limit 1000) exist but need enforcement at query execution time too: `max_execution_time` settings, row count caps, and relationship allow-lists. ### Grammar-based query validation This is the "belt and suspenders" layer. Even though the compiler only produces SQL from validated, typed AST nodes, we want a separate pass that walks the final SQL AST and confirms required predicates are present before execution. - [ ] SQL AST walker that verifies `organization_id` and `traversal_path` predicates appear in every WHERE clause. If they're missing, the query is rejected. Fail-closed. - [ ] Parameterization audit: verify all user-provided values are bound as parameters, not concatenated. This is already true by construction, but an explicit check gives us a safety net and something to point to in security reviews. ### Service-to-service authentication The gkg-server needs to verify that requests actually came from GitLab Rails and extract the user's authorization context from the token. - [ ] JWT verification: validate HS256-signed tokens with a shared secret. Extract `user_id`, `organization_id`, and `traversal_ids` from the payload. These feed into the security context the compiler uses. - [ ] Token expiry: reject tokens older than 5 minutes. - [ ] MTLS between Rails and the gkg-server for transport-level encryption. ### Testing and security validation The simulator handles functional and performance testing. What's missing is adversarial testing: what happens when someone actively tries to break the authorization model? - [ ] Fuzzing framework: generate malformed, edge-case, and adversarial JSON inputs automatically. Run as part of CI. This catches both crashes and authorization bypasses. - [ ] Property-based testing: generate random valid queries and verify three invariants: (1) all generated SQL includes required auth predicates, (2) execution time stays within bounds, (3) result sets respect authorization constraints. - [ ] Penetration test scenarios informed by the [query engine threat model](https://gitlab.com/gitlab-org/knowledge-graph/gkg_query_engine_threat_model.md): SQL injection payloads, predicate stripping attempts, cross-tenant access, aggregation bypass. ### Query execution and response The compiler produces SQL. Someone needs to run it. - [ ] ClickHouse query execution with connection pooling, rate limiting per user, and `max_execution_time` enforcement. - [ ] Response formatting: return the result rows plus the generated SQL (for debugging) and a request ID that correlates with ClickHouse's `query_log`. ### Query interface adapters (future) - [ ] Cypher-to-SQL translator: parse Cypher syntax into the same AST the JSON DSL uses, then run it through the same lowering/codegen pipeline. This would let teams use standard graph query syntax if they prefer it over the JSON DSL. ### Readiness - [ ] Platform readiness review per the PREP framework: service architecture, security/compliance, performance/scalability, observability, operational lifecycle, and quality assurance assessments. No GKG feature has been through PREP yet. - [ ] Define acceptable performance targets: latency and memory utilization per query for .com and self-managed deployments. - [ ] Per-entity integration testing across all 6 domains (core, code_review, ci, security, plan, source_code) with 25+ entity types. The simulator tests query shapes, but we also need to verify that real data for each entity type produces correct results. --- ## Security considerations The [query engine threat model](https://gitlab.com/gitlab-org/knowledge-graph/gkg_query_engine_threat_model.md) covers seven threat categories. Here's where each one stands: | ID | Threat | Severity | Likelihood | Status | |----|--------|----------|------------|--------| | TQ1 | SQL Injection via JSON Input | Critical | Low | **Mitigated** — all values parameterized, operators allow-listed, IDs typed | | TQ2 | Authorization Filter Bypass | Critical | Medium | **Partially mitigated** — `traversal_path` injection done; grammar-based verification pending | | TQ3 | Allow-List Bypass | High | Low | **Mitigated** — ontology-derived allow-lists, fail-closed | | TQ4 | LLM Prompt Injection | High | Medium | **Partially mitigated** — same validation pipeline regardless of source | | TQ5 | Resource Exhaustion | Medium | Medium | **Mitigated** — depth caps, row limits, timeouts at schema + query level | | TQ6 | Information Leakage via Errors | Medium | Low | **Review required** | | TQ7 | Aggregation Authorization Bypass | High | High | **In progress** — [!174](https://gitlab.com/gitlab-org/orbit/knowledge-graph/-/merge_requests/174) adds `groupUniqArray` collection for fail-closed redaction | --- ## Closed / superseded MRs | MR | Title | Reason | |----|-------|--------| | [!31](https://gitlab.com/gitlab-org/orbit/knowledge-graph/-/merge_requests/31) | feat(querying): migrate graph engine from Go to Rust | Superseded — split into smaller MRs (!92, !96, !103, etc.) | | [!76](https://gitlab.com/gitlab-org/orbit/knowledge-graph/-/merge_requests/76) | Draft: graph data simulator | Superseded — rebased and split into !85, !106, etc. | | [!59](https://gitlab.com/gitlab-org/orbit/knowledge-graph/-/merge_requests/59) | chore(ontology): consolidate schema codepaths | Superseded — merged as !60 against main | | [!215](https://gitlab.com/gitlab-org/orbit/knowledge-graph/-/merge_requests/215) | Draft: feat(engine): batch search support | Closed — deferred for future iteration | --- ## Appendix: full merged MR index <details> <summary>54 merged MRs (click to expand)</summary> | MR | Date | Title | |----|------|-------| | [!214](https://gitlab.com/gitlab-org/orbit/knowledge-graph/-/merge_requests/214) | 2026-02-11 | chore(graphsec): decoupling ontology from lowering phase | | [!213](https://gitlab.com/gitlab-org/orbit/knowledge-graph/-/merge_requests/213) | 2026-02-02 | fix(querying): columns of an aggregated entity are now returned | | [!204](https://gitlab.com/gitlab-org/orbit/knowledge-graph/-/merge_requests/204) | 2026-02-10 | feat(simulator): capture results and query plans | | [!202](https://gitlab.com/gitlab-org/orbit/knowledge-graph/-/merge_requests/202) | 2026-02-01 | fix(querying): neighbors queries fix for user/group | | [!194](https://gitlab.com/gitlab-org/orbit/knowledge-graph/-/merge_requests/194) | 2026-02-01 | fix(querying): support ontology EnumType in normalize | | [!191](https://gitlab.com/gitlab-org/orbit/knowledge-graph/-/merge_requests/191) | 2026-02-01 | feat(querying): normalize phase for filter coercion | | [!185](https://gitlab.com/gitlab-org/orbit/knowledge-graph/-/merge_requests/185) | 2026-02-01 | perf(simulator): scale to 100M+ edges | | [!181](https://gitlab.com/gitlab-org/orbit/knowledge-graph/-/merge_requests/181) | 2026-01-31 | perf(graph): path-finding performance | | [!179](https://gitlab.com/gitlab-org/orbit/knowledge-graph/-/merge_requests/179) | 2026-01-31 | chore(graph): remove hash joins from path-finding | | [!177](https://gitlab.com/gitlab-org/orbit/knowledge-graph/-/merge_requests/177) | 2026-01-31 | feat(querying): wildcard edges + multiple edge types | | [!170](https://gitlab.com/gitlab-org/orbit/knowledge-graph/-/merge_requests/170) | 2026-01-31 | feat(querying): return edges alongside nodes | | [!168](https://gitlab.com/gitlab-org/orbit/knowledge-graph/-/merge_requests/168) | 2026-01-31 | feat(graphsec): unroll path-finding recursive CTEs | | [!161](https://gitlab.com/gitlab-org/orbit/knowledge-graph/-/merge_requests/161) | 2026-01-31 | chore(testing): fix simulator after ontology changes | | [!158](https://gitlab.com/gitlab-org/orbit/knowledge-graph/-/merge_requests/158) | 2026-01-30 | feat(querying): neighbors query type | | [!155](https://gitlab.com/gitlab-org/orbit/knowledge-graph/-/merge_requests/155) | 2026-01-30 | feat(querying): column selection in queries | | [!154](https://gitlab.com/gitlab-org/orbit/knowledge-graph/-/merge_requests/154) | 2026-01-30 | chore(etl): traversal_path as explicit ontology property | | [!153](https://gitlab.com/gitlab-org/orbit/knowledge-graph/-/merge_requests/153) | 2026-01-30 | feat(graph): search query + JSON Schema cleanup | | [!144](https://gitlab.com/gitlab-org/orbit/knowledge-graph/-/merge_requests/144) | 2026-01-30 | feat(sec): path-finding redaction | | [!141](https://gitlab.com/gitlab-org/orbit/knowledge-graph/-/merge_requests/141) | 2026-01-30 | chore(simulator): cleanup | | [!128](https://gitlab.com/gitlab-org/orbit/knowledge-graph/-/merge_requests/128) | 2026-01-30 | feat(querying): multi-hop traversals | | [!125](https://gitlab.com/gitlab-org/orbit/knowledge-graph/-/merge_requests/125) | 2026-01-29 | chore(graph): source_id/target_id edge columns | | [!123](https://gitlab.com/gitlab-org/orbit/knowledge-graph/-/merge_requests/123) | 2026-01-29 | chore(graph): fix recursive CTEs | | [!121](https://gitlab.com/gitlab-org/orbit/knowledge-graph/-/merge_requests/121) | 2026-01-29 | chore(graph): kg_ → gl_ prefix migration | | [!119](https://gitlab.com/gitlab-org/orbit/knowledge-graph/-/merge_requests/119) | 2026-01-29 | feat(querying): edge generation bug fixes | | [!115](https://gitlab.com/gitlab-org/orbit/knowledge-graph/-/merge_requests/115) | 2026-01-29 | feat(querying): simulator PRNG seeding | | [!112](https://gitlab.com/gitlab-org/orbit/knowledge-graph/-/merge_requests/112) | 2026-01-29 | feat(cli): CLI access to graph engine compiler | | [!106](https://gitlab.com/gitlab-org/orbit/knowledge-graph/-/merge_requests/106) | 2026-01-29 | feat(querying): simulator evaluation phase | | [!103](https://gitlab.com/gitlab-org/orbit/knowledge-graph/-/merge_requests/103) | 2026-01-28 | feat(querying): codegen phase | | [!96](https://gitlab.com/gitlab-org/orbit/knowledge-graph/-/merge_requests/96) | 2026-01-28 | feat(querying): AST lowering + security policy | | [!92](https://gitlab.com/gitlab-org/orbit/knowledge-graph/-/merge_requests/92) | 2026-01-28 | feat(querying): input + validate phases | | [!91](https://gitlab.com/gitlab-org/orbit/knowledge-graph/-/merge_requests/91) | 2026-01-28 | feat(ontology): stricter node validation | | [!90](https://gitlab.com/gitlab-org/orbit/knowledge-graph/-/merge_requests/90) | 2026-01-28 | feat(ontology): simplify edge format | | [!88](https://gitlab.com/gitlab-org/orbit/knowledge-graph/-/merge_requests/88) | 2026-01-28 | chore(ontology): flatten nodes for ETL | | [!85](https://gitlab.com/gitlab-org/orbit/knowledge-graph/-/merge_requests/85) | 2026-01-27 | feat(querying): simulator YAML config | | [!77](https://gitlab.com/gitlab-org/orbit/knowledge-graph/-/merge_requests/77) | 2026-01-27 | feat(simulator): auth-hierarchy-aware generation | | [!65](https://gitlab.com/gitlab-org/orbit/knowledge-graph/-/merge_requests/65) | 2026-01-27 | feat(querying): traversal_id support + writer speedup | | [!62](https://gitlab.com/gitlab-org/orbit/knowledge-graph/-/merge_requests/62) | 2026-01-26 | feat(querying): ontology-derived graph generation | | [!60](https://gitlab.com/gitlab-org/orbit/knowledge-graph/-/merge_requests/60) | 2026-01-26 | chore(ontology): consolidate schema codepaths | | [!57](https://gitlab.com/gitlab-org/orbit/knowledge-graph/-/merge_requests/57) | 2026-01-26 | feat(ontology): code ontology | | [!56](https://gitlab.com/gitlab-org/orbit/knowledge-graph/-/merge_requests/56) | 2026-01-26 | feat(ontology): CI ontology | | [!55](https://gitlab.com/gitlab-org/orbit/knowledge-graph/-/merge_requests/55) | 2026-01-26 | feat(ontology): plan ontology | | [!54](https://gitlab.com/gitlab-org/orbit/knowledge-graph/-/merge_requests/54) | 2026-01-26 | feat(ontology): security ontology | | [!53](https://gitlab.com/gitlab-org/orbit/knowledge-graph/-/merge_requests/53) | 2026-01-26 | feat(ontology): MR ontology | | [!51](https://gitlab.com/gitlab-org/orbit/knowledge-graph/-/merge_requests/51) | 2026-01-26 | feat(lib): shared ontology crate | | [!47](https://gitlab.com/gitlab-org/orbit/knowledge-graph/-/merge_requests/47) | 2026-01-26 | feat(querying): deprecate nodes table | | [!42](https://gitlab.com/gitlab-org/orbit/knowledge-graph/-/merge_requests/42) | 2026-01-26 | fix(querying): hardening pass | | [!33](https://gitlab.com/gitlab-org/orbit/knowledge-graph/-/merge_requests/33) | 2026-01-25 | chore(test): move code fixtures | | [!22](https://gitlab.com/gitlab-org/orbit/knowledge-graph/-/merge_requests/22) | 2026-01-25 | chore(dx): CLI + perf scripts | | [!21](https://gitlab.com/gitlab-org/orbit/knowledge-graph/-/merge_requests/21) | 2026-01-25 | feat(indexer): streaming file walking | | [!17](https://gitlab.com/gitlab-org/orbit/knowledge-graph/-/merge_requests/17) | 2026-01-25 | feat(code): migrate code indexer | | [!6](https://gitlab.com/gitlab-org/orbit/knowledge-graph/-/merge_requests/6) | 2026-01-24 | chore(dx): remove redundant treesitter code | | [!3](https://gitlab.com/gitlab-org/orbit/knowledge-graph/-/merge_requests/3) | 2026-01-24 | feat(dx): MR and issue templates | | [!2](https://gitlab.com/gitlab-org/orbit/knowledge-graph/-/merge_requests/2) | 2026-01-24 | feat(dx): setup mise | | [!1](https://gitlab.com/gitlab-org/orbit/knowledge-graph/-/merge_requests/1) | 2026-01-24 | feat(indexing): code parser + treesitter in-tree | </details>
epic