Orbit - GitLab Knowledge Graph as a Service - GA (#19744) · Epics · GitLab.org

Orbit - GitLab Knowledge Graph as a Service - GA

![Demo Video](/uploads/cd63ef26d74c2d12c43e34b716f418c5/Screenshot_2025-10-26_at_1.01.12_PM.png){width="800"} - Demo Links: - Google Drive: https://drive.google.com/file/d/1Gt-1PEdt7NASgofXEE1Fe2Tr9e4hw86Y/view?usp=sharing - Youtube: https://www.youtube.com/watch?v=bvz9VgC7DZ0 - Try it yourself: https://gitlab.com/gitlab-org/rust/knowledge-graph/-/issues/263+ > Note: The video is internal only until approved by PMM. This demo is not the final product, it is to show the _vision_. ## Problem Statement Modern software development operates across a complex web of repositories, issues, merge requests, CI/CD pipelines, deployment environments, infrastructure, and assets. Both code data and SDLC platform metadata are inherently interconnected network graphs. While GitLab is a single vehicle to deliver these collective features, our ability to consume and analyze this data is fragmented, forcing developers and AI agents to piece together context through dozens of API, GraphQL, and Agent-tool calls. GitLab has hundreds of REST APIs and GraphQL Schema Elements. Users and AI agents need to be able to reason about the data in ways that are not practical with traditional database queries. GitLab needs a unified data layer service to power future Agents and tomorrow's analytics features. ## Proposed Solution As the second iteration, we will expand upon the foundational work outlined in https://gitlab.com/groups/gitlab-org/-/epics/17514+ to build the _Knowledge Graph as a Service_. The GitLab Knowledge Graph will be a backend service that will expose APIs and an MCP Server for accessing structured property graph representations of GitLab instance data via a graph query engine. It will unify both **SDLC metadata** (e.g., issues, merge requests, ci/cd, vulnerabilities) and **code-level metadata** (e.g., symbols, functions) in a **graph format**, optimized for consumption by both AI systems (e.g., LLMs) and human users through analytics or product features. - **Design Document**: https://gitlab.com/gitlab-com/content-sites/handbook/-/merge_requests/16424+ - **Engineering Executive Summary**: https://docs.google.com/document/d/193lERl7XvQX2aipqTW8v8QBV4UmHYUPehTK7mgOnQUI/edit?tab=t.sxui33e0fwa - **Product Documentation**: https://docs.google.com/document/d/1EN5Y7IxMgEZIUZESCxcKxICHOGL_xx1wPsBXPHdw7gE/edit?tab=t.0 - **Offsite Notes**: https://docs.google.com/document/d/1BLfJGqyHtaNSdf_OO_YFoNaQFcaewMc1IgKKytPGGEg/edit - **GA Planning Tracker**: https://docs.google.com/spreadsheets/d/1-ININY1U3e6hfs10B2io1YBZeNmCObIfuL0932dlzFU/edit?gid=2072133881#gid=2072133881 - **Roadmap**: https://gitlab.com/groups/gitlab-org/-/work_items/20331 ## What we are Building ### Knowledge Graph API & MCP Server One secure connection and data layer to authoritative SDLC context and wholistic Repository context. The GitLab Duo Agent Platform integrates with this server to query first-party GitLab data — issues, MRs, pipelines, security, and code — without stitching together 10+ integrations. * **MCP interface for Duo (and other AI tools)**: 2 tool contracts (`query_graph`, `get_graph_schema`) that compile to parameterized Cypher-like ClickHouse SQL. See [Orbit API docs](https://docs.gitlab.com/api/orbit/). * Ref: [_Intermediate Query Language_ ](https://gitlab.com/gitlab-com/content-sites/handbook/-/merge_requests/16424)in the design document. * **HTTP/GRPC API**: Read-only REST endpoints under `/api/graph/*` for product features (e.g., Software Architecture Map, analytics). * **Security by design**: Three layers of defense 1. **Tenant isolation** in storage (per-namespace data partitioning), 2. **Traversal ID filtering** is injected into every query. 3. **Final redaction** via Rails permission checks (`Ability.allowed?`) before results are returned. * Ref: [_Security Architecture_](https://gitlab.com/gitlab-com/content-sites/handbook/-/merge_requests/16424) in the design document. * **Observability**: Prometheus metrics, request tracing, and structured JSON logs with correlation IDs. * Ref: [_Observability_ ](https://gitlab.com/gitlab-com/content-sites/handbook/-/merge_requests/16424)in the design. > **Packaging note**: The Knowledge Graph is an **independent data product** (not tied to DAP entitlements). Usage-based billing via GitLab Credits for .com and Dedicated; per-seat add-on for Self-Managed. GKG usage originating from DAP agents will be **zero-rated** (that is, we won't charge for DAP using the Knowledge Graph - only for queries through 3rd party tools, such as Claude through MCP). See [Packaging Decisions](#packaging-decisions) below and the [Product Executive Brief](https://docs.google.com/document/d/1EN5Y7IxMgEZIUZESCxcKxICHOGL_xx1wPsBXPHdw7gE/edit?tab=t.0) for details. ### Knowledge Graph Indexing Engine The indexing service operates as a distributed ETL (Extract, Transform, Load) pipeline that leverages the [Data Insights Platform](https://handbook.gitlab.com/handbook/engineering/architecture/design-documents/data_insights_platform/) to process both SDLC events and code changes. It uses [Siphon](https://gitlab.com/gitlab-org/analytics-section/siphon) for PostgreSQL Change Data Capture, NATS for event streaming and coordination, and ClickHouse as both a raw data lake and the final graph storage layer. This architecture enables scalable, incremental indexing of both SDLC metadata and code repositories into a unified property graph. **SDLC data priority for indexing:** 1. Basic Structure (Groups, Projects) 2. Code Review (MRs, MR notes) 3. CI/CD (pipelines, jobs, stages) 4. Security (vulnerability reports) 5. Plan (Issues, Epics, notes) 6. Deployments ### Software Architecture Map A first-party **Vue 3** UI embedded in GitLab that visualizes services, dependencies, and lineage by querying the Knowledge Graph. **Use cases** * **Onboarding & ownership**: Discover components, owners, and related MRs/issues. * **Blast radius & incident analysis**: Navigate "N hops" from a service to downstream pipelines, deployments, and vulnerabilities. * **Planning & impact**: Visualize epics → issues → MRs touching specific services/files. **Integration** * Uses the same query surface (MCP/HTTP) with permission inheritance and auditability. * Designed to interoperate with DAP (evidence links, expandable paths, drill-through into GitLab artifacts). ## Packaging Decisions The following packaging and pricing decisions were made at the [Knowledge Graph Working Session (Jan 28, 2026)](https://docs.google.com/document/d/1QLa5ZgyD_-qKilSJOESNjSqbGWZ7zmhfz6mVKBsuuAg/edit?tab=t.0) and [GKG GA Working Session (Jan 26, 2026)](https://docs.google.com/document/d/1WYfjNp3ynk34sDElnrLmrfZV0gg9cOHwwl4NdFte7Do/edit?tab=t.0): 1. **GKG is an independent product** — not gated by or bundled with DAP. 2. **Usage-based pricing** for .com & Dedicated deployments using **GitLab Credits** as the SKU. 3. **Per-seat add-on** for Self-Managed. 4. **Customer-driven queries are charged** (e.g., pulling context into external agents/products via MCP). 5. **GitLab-driven queries are zero-rated** (e.g., DAP Agents, Agentic Flows, GitLab-native UX like the Software Architecture Map). 6. GKG usage from custom Agentic Flows is **not charged**, since customers are already paying for custom flows. **Open items under active investigation:** * Cost simulation and unit economics (cost drivers for install, run, query, and data retention) * Competitive analysis on KG pricing models * SM deployment model: customer-run vs. GitLab-managed single/multi-tenant service * Meter definition and credit conversion ratio (e.g., 1 GitLab Credit = X KG queries) * Data retention policy and customer expectations ## GA Scope (.com First — End of April 2026 Target) Based on the [GKG GA Working Session](https://docs.google.com/document/d/1WYfjNp3ynk34sDElnrLmrfZV0gg9cOHwwl4NdFte7Do/edit?tab=t.0), the following are the requirements for GA on .com: * **Setup Experience & Observability** * "Is the GKG running?" / "How do I enable it?" / "How do I know it's working?" * Permissions architecture (Planner+ role) * On-ramps in the product (e.g., under DAP/Duo settings page) * **API & MCP Server** * Indexed data according to the [Priority List and Data Model](https://gitlab.com/groups/gitlab-org/-/epics/20297) * User outcomes via 2 MCP tools (`query_graph`, `get_graph_schema` - consolidated from 6): items related to X with filtering, items aggregated by X, find entities connected to X. See [Orbit API docs](https://docs.gitlab.com/api/orbit/). * **End-to-end data pipeline** (Siphon → NATS → ClickHouse → GKG) * **Production Readiness (PREP)**: Observability, dashboards, alerts, runbooks, rate limits, severity mapping * **Field Enablement**: [Guide MR](https://gitlab.com/gitlab-com/content-sites/handbook/-/merge_requests/17557) * **Support Enablement**: [Diagnosis guidelines](https://docs.google.com/document/d/1WehXNiv52hAJSa3ZYWQ21RkPO3sHm0cvcV_TmkdsU58/edit) and runbooks ## Q&A #### What is SDLC vs Code Indexing? * **SDLC indexing** ingests **GitLab metadata** (issues, MRs, pipelines, deployments, vulnerabilities, projects, namespaces, users). Data arrives via **Siphon → NATS**, is staged in ClickHouse, then transformed into **graph node/edge tables** with `traversal_ids` for permission-scoped queries. * **Code indexing** ingests **repository structure and semantics** (directories, files, definitions/symbols, imports, call edges). Branch push events trigger indexers to fetch via **Gitaly**, analyze language syntax/semantics, then write **code graph node/edge tables**. ```mermaid flowchart TB %% === Styles === classDef agent fill:#b91c1c,stroke:#ef4444,color:#fff,stroke-width:1px,rx:18,ry:18,font-size:14px; classDef sdlc fill:#1e40af,stroke:#60a5fa,color:#fff,stroke-width:1px,rx:10,ry:10,font-size:13px; classDef code fill:#166534,stroke:#22c55e,color:#fff,stroke-width:1px,rx:10,ry:10,font-size:13px; classDef spacer fill:transparent,stroke:transparent,color:transparent; %% === Top Layer: Duo Agent Platform === subgraph Agents["Duo Agent Platform"] direction LR A_SP_TOP[" "]:::spacer A1["Deep Research Agent"]:::agent A2["SWE Agent"]:::agent A3["Code Review Agent"]:::agent A4["+ More AI Agents"]:::agent end style Agents fill:#6b7280,stroke:#6b7280,opacity:0.25,rx:10,ry:10 %% === Middle Layer: Knowledge Graph Service === subgraph KGS["Knowledge Graph Service"] direction LR SDLC["SDLC Graph Index · platform-wide context · e.g. MRs, CI/CD, Issues"]:::sdlc CODE["Code Graph Index · repository-wide context · e.g definitions, references "]:::code end style KGS fill:#374151,stroke:#93c5fd,stroke-width:1.5px,rx:14,ry:14,color:#fff %% === Connections === A1 --> KGS A2 --> KGS A3 --> KGS A4 --> KGS ``` #### Why build on the Data Insights Platform? Building on the Data Insights Platform allows the Knowledge Graph to function as a scalable, distributed system without impacting the performance of the production OLTP database. * **Siphon**: Provides Change Data Capture (CDC) from PostgreSQL via logical replication, decoupling the indexing service from the primary database. * **NATS**: Acts as a durable message broker for event-driven logic, high availability, and queuing for both code and SDLC indexing. * **ClickHouse**: Serves as the primary data store (data lake and graph storage), enabling analytical queries without touching the production database. #### Why a Graph? While GitLab is powered by two primary data stores (Postgres and Git), GitLab data, including source code, can be represented in a network graph. <details> <summary>Expand for reasons why</summary> #### 1. AI Works Best with Property Graph Data Models Our [research and live prototypes](https://gitlab.com/gitlab-org/rust/knowledge-graph/-/issues/263) showed that LLMs reliably generate property graph tool calls because their syntax directly mirrors natural "find-things-connected-to-X" reasoning. For example, "find all issues closed by merge requests authored by @user within two hops of project Y" translates deterministically into a pattern like `MATCH (p:Project)<-[:CLOSES]-(m:MergeRequest)<-[:AUTHORED]-(u:User)`. By contrast: - GraphQL and REST require schema introspection and nested field expansion. LLMs struggle to reason about variable-depth recursion or dynamic joins inside those structures. - Cypher exposes explicit graph patterns and bounded hop limits (\*1..3) that match the mental model of "neighbor exploration." #### 2. We Need Arbitrary Neighbor Exploration and Path Finding (N-Hop Queries) Many Knowledge Graph workloads involve **exploring neighbors** and **path finding** up to N levels deep—for example, finding "all pipelines triggered by MRs that close issues linked to epics under a group." Neither REST nor GraphQL provides a clean or efficient way to express variable-length traversal: - REST would require chained requests or recursive pagination. - GraphQL can express limited nesting but not dynamic-depth traversal (\*..N); resolvers explode in complexity and performance cost. Cypher's MATCH (a)-\[\*1..N\]-\>(b) semantics make such traversals first-class, optimized at the storage layer, and declarative. #### 3. Aggregations and Analytics Are Essential The Knowledge Graph is not just a document API—it is an analytical OLAP system. We routinely need aggregations such as: ```cypher MATCH (p:Project)-[:HAS_ISSUE]->(i:Issue) RETURN p.name, count(i) AS issue_count ORDER BY issue_count DESC; ``` Implementing equivalent groupings via GraphQL or REST would either require bespoke endpoints or push heavy joins into the application layer. GitLab postgres times out on these queries today. Cypher allows server-side execution with optimized graph planners, leveraging adjacency lists and columnar execution with ClickHouse. #### 4. Schema Flexibility and Evolution are Essential Customers will eventually need to be able to add their own data to the graph. Additionally, the Knowledge Graph's schema must evolve rapidly as new GitLab SDLC entities (e.g., vulnerabilities, packages, runners) appear. GraphQL schemas require explicit type registration and backfilling, creating friction for iteration. Cypher, being label-based, allows us to introduce new node or relationship labels without altering existing queries—`MATCH (n:Vulnerability)` returns zero rows until those labels exist. We can also use this to add custom data types in the future—something customers have shown strong interest in. This makes property graphs a far more flexible and schema-tolerant choice for both AI-driven analytics and human consumers. #### 5. Open Cypher (GQL) and Property Graphs are now Standard The Knowledge Graph service is intentionally aligned with Open Cypher (GQL) and, most importantly, with Property Graphs, which are now standardized by [SQL 2023's ISO/IEC 9075-16:2023)](https://www.iso.org/standard/79473.html) . Cypher-like patterns are the de facto standard for property-graph databases and Knowledge Graphs (Neo4j, Memgraph, Kùzu). By adopting it, we inherit a well-understood, declarative language for expressing complex traversals, aggregations, and pattern matching over graph data—operations required to fully leverage GitLab's SDLC and code metadata. > You can read more about how we will enable all LLMs to query under the [llm_querying](/handbook/engineering/architecture/design-documents/gitlab_knowledge_graph/llm_querying/) doc. ### Knowledge Graph is OLAP, not OLTP The Knowledge Graph is an **OLAP application** over an OLTP one. The service is not required to provide transaction guarantees or real-time data for the first iteration. The Knowledge Graph initial iteration will serve as a **read-only** analytical data store and a data retrieval API for users and AI agents, providing access to code and SDLC metadata. </details> #### What database are we using? We are building a **Graph Query Engine on ClickHouse**. After the original choice, KuzuDB, was archived, the team evaluated several options and decided to leverage GitLab's existing, approved database stack. This approach provides several advantages: * The property graph data model is the most critical component, not the specific underlying database. * We leverage significant operational experience with ClickHouse, reducing maintenance overhead and SRE/DBRE costs. * It allows for a faster time-to-market and avoids lengthy legal and procurement cycles associated with new database vendors. * It's a "Two-Way Door": if ClickHouse doesn't meet our long-term needs, the data pipeline components (Siphon, NATS) can be reused to feed a different graph database in the future. **Infrastructure decisions from offsite (Jan 2026):** * Continue with same CH Cloud instance for staging; separate logical DB for GKG; test to determine if separate physical instance is needed. * Build staging environment for .com (Siphon for each PG DB → dedicated physical CH instance). * Build new cluster to hold NATS for GKG. #### What is our delivery story? * Three **parallel** workstreams: 1. **Siphon (producer-only) on GitLab.com** — staging ongoing, ironing out network connectivity between eventsdot-staging and PG-staging. End-to-end pipeline target (Siphon → NATS → ClickHouse): **March 1 (stretch)**. 2. **NATS at deployment-unit level** — new cluster build in progress, verifying alignment with DIP team's Helm chart. 3. **ClickHouse consumers + schema expansion** for SDLC/CI/CD — aligned with NATS availability, wiring to staging. * **GA target for .com**: **End of April 2026**. * [**GA target for Dedicated and Self-Managed**](https://gitlab.com/groups/gitlab-com/gl-infra/gitlab-dedicated/-/work_items/915): **Q2 FY27** (immediately after .com validation via [**OAK / Self-Managed Runway**](https://handbook.gitlab.com/handbook/engineering/architecture/design-documents/selfmanaged_segmentation/#omnibus-adjacent-kubernetes-oak)). #### What about local code indexing? We currently offer a local code-indexing CLI tool called `gkg`. This tool provides local indexing for workspaces and allows agents to connect to a local MCP server. It bundles KuzuDB as a statically linked library, packaged as a single binary that runs on all major operating systems. Users can start a local server (gkg server start) to browse their indexed projects through a web UI. We are _prioritizing Knowledge Graph as a Service first_. We are evaluating if https://clickhouse.com/chdb, an in-process database, can replace Kuzu for the local graph functionality. ## Key Risks Identified at the [Knowledge Graph Offsite (Jan 2026)](https://docs.google.com/document/d/1BLfJGqyHtaNSdf_OO_YFoNaQFcaewMc1IgKKytPGGEg/edit): * **ClickHouse knowledge gap** — No deep ClickHouse operational expertise on the KG team currently; dependent on DBRE support. * **AuthZ hitting PostgreSQL** — Each query result row may require PG permission checks, potentially causing load on the production database. * **ClickHouse Cloud regional availability** — Not available in all Dedicated customer regions. * **Single-node SM customers** — Product requires Kubernetes; large percentage of paying SM customers run single-node Omnibus. GTM clarity needed. * **CDot scalability** — Balance checks for billing may be a bottleneck; infrastructure needs assessment. ## Timeline * **Current (Feb 2026)**: Siphon staging deployment ongoing; NATS cluster build; ClickHouse consumer wiring. * **March 2026 (stretch)**: End-to-end Siphon → NATS → ClickHouse pipeline operational. * **End of April 2026**: GA for .com. * **Q2 FY27**: GA for Dedicated and Self-Managed (OAK/Runway gated rollout). --- ### References & Epics * **Epic: Knowledge Graph (First Iteration)**: https://gitlab.com/groups/gitlab-org/-/epics/17514 * **Epic: Knowledge Graph Server**: https://gitlab.com/groups/gitlab-org/-/epics/17518 * **Epic: Knowledge Graph GA**: https://gitlab.com/groups/gitlab-org/-/work_items/19744 * **Epic: Support for GitLab Knowledge Graph GA on Dedicated:** https://gitlab.com/groups/gitlab-com/gl-infra/gitlab-dedicated/-/work_items/915 * **Roadmap**: https://gitlab.com/groups/gitlab-org/-/work_items/20331 * **Database Selection (Research)**: https://gitlab.com/groups/gitlab-org/rust/-/epics/31 * **Siphon (CDC)**: https://gitlab.com/gitlab-org/analytics-section/siphon * **NATS (JetStream)**: https://docs.nats.io/ * **ClickHouse**: https://clickhouse.com/docs/ * **Field Enablement Guide**: https://gitlab.com/gitlab-com/content-sites/handbook/-/merge_requests/17557 * **Diagnosis Guidelines (Support)**: https://docs.google.com/document/d/1WehXNiv52hAJSa3ZYWQ21RkPO3sHm0cvcV_TmkdsU58/edit > For packaging/monetization details, see the [**Product Executive Brief**](https://docs.google.com/document/d/1EN5Y7IxMgEZIUZESCxcKxICHOGL_xx1wPsBXPHdw7gE/edit?tab=t.0). For SLOs, on-call, and runbooks, see **Observability** and **Maintenance & Ongoing Operations** sections in the [Engineering Executive Brief](https://docs.google.com/document/d/193lERl7XvQX2aipqTW8v8QBV4UmHYUPehTK7mgOnQUI/edit?tab=t.sxui33e0fwa). For offsite decisions and action items, see the [Offsite Notes](https://docs.google.com/document/d/1BLfJGqyHtaNSdf_OO_YFoNaQFcaewMc1IgKKytPGGEg/edit).

epic