GitLab Knowledge Graph First Iteration
**Vision Statement**
_Imagine opening any repository—on your laptop, in CI, or inside the Web IDE—typing one command, and instantly seeing how every file, class, and import fits together._
We aim for any developer to access the GitLab Knowledge Graph project, run the tool on any machine, point it to a repository, and generate a working Repository Knowledge Graph of the project. Then, the knowledge graph can be extended and deeply integrated with GitLab features for customers. Think [Gource](https://github.com/acaudwell/Gource) for Repository code and GitLab features, for AI use cases.
- **Open‑source & stand‑alone:** works on macOS, Linux, Windows; no GitLab account required. By making this a standalone project and lowering the barrier to entry, we can encourage community contributions.
- **Deeply integrated:** the very same artifact enriches the GitLab Monolith, Language Server, CLI, and future AI features—one graph, everywhere.
**Problem**
Our current AI data sources cannot provide comprehensive context for a whole repository. Approaches like basic code chunking and description enrichment struggle with scenarios requiring deep structural understanding, such as accurately finding test files, finding references across functions, or navigating large, unfamiliar codebases (<a href="https://gitlab.com/groups/gitlab-org/-/epics/16251">X-Ray Graph Epic</a>, <a href="https://gitlab.com/gitlab-org/gitlab/-/issues/508978">Use Cases</a>).
Answering sophisticated questions about code structure, dependencies, and history remains challenging without a unified, structured representation of code repositories.
**Solution**
Build a system to create a structured, queryable representation of code repositories. **The GitLab Knowledge Graph** will capture entities like files, directories, classes, functions, and their relationships (imports, calls, inheritance, etc.).
The high-level solution involves five key architectural components:
1. <a href="https://gitlab.com/groups/gitlab-org/-/epics/17516">**Unified Parser Project (`gitlab-code-parser`):**</a> A single, high-performance static analysis library built in Rust, leveraging `tree-sitter` and `ast-grep`. This library will provide consistent parsing across supported languages for various GitLab features, which are capable of running server-side and client-side (via Wasm/FFI). (<a href="https://gitlab.com/gitlab-org/gitlab/-/issues/534153">One Parser Proposal</a>)
2. **Graph Database Technology:** An underlying embeddable graph database called [Kuzu](https://docs.kuzudb.com/get-started/) to persist the graph structure, accessed via a database client tailored to the server or client.
3. [**Knowledge Graph Core Project (Rust):**](https://gitlab.com/groups/gitlab-org/-/epics/17517) Contains the central logic for extracting ASTs (via `gitlab-code-parser`), defining graph nodes/edges, matching entities (e.g., definitions to references), and managing data structures for graph construction and querying. This project will expose several crates capable of indexing repositories both server-side (for features integrated into the GitLab platform) and client-side (e.g., via a standalone CLI or Language Server integration)
4. [**Knowledge Graph Server Architecture:**](https://gitlab.com/groups/gitlab-org/-/epics/17518) The server-side of the Knowledge Graph project will create an indexer worker that will wrap the core Rust project and expose a thin API service that allows Rails to query graph nodes. (<a href="https://gitlab.com/gitlab-com/content-sites/handbook/-/merge_requests/13104">Server Design Document</a>)
5. <a href="https://gitlab.com/groups/gitlab-org/-/epics/17515">**Client-side Repository Interaction**</a> (`gitalisk`): A Rust-based library providing efficient and safe cross-platform `git` operations, used for accessing repository data and structure during indexing. (<a href="https://gitlab.com/groups/gitlab-org/-/epics/17514">Gitalisk Epic</a>)
> **Important Note:** This project is initially scoped to AI features.
This system will enable the knowledge graph through multiple interfaces:
* **Standalone CLI:** Allows local indexing and exploration, and serves as a local query interface/UI.
* **Language Server Integration:** Provides real-time querying (navigation, context building) directly within IDEs for AI Features.
* **GitLab Server Integration:** This feature enables features like advanced code search, codebase understanding in Duo Chat, context for Code Suggestions, and impact analysis directly within the GitLab platform.
Here is a chart outlining the above:
```mermaid
graph TD
subgraph GitLab Knowledge Graph System
direction LR
subgraph Interfaces
CLI[Standalone CLI]
LSP["Language Server (IDE Integration)"]
Server[GitLab Server Features Chat, Search, etc.]
end
subgraph Core Components - Rust Crates
CoreKG["Knowledge Graph Core Crate <br/><i>(Node Connection, Graph Logic, Query Prep) </i>"]
Parser["gitlab-code-parser <br/><i> (Tree-sitter based AST Extraction)</i>"]
Gitalisk["gitalisk <br/><i> (Client Side Repository Access & Git Ops)</i>"]
DBClient["Database Client <br/><i> (DB Interaction)</i>"]
end
subgraph Data Storage
DB[(Graph Database <br/>e.g., Kuzu)]
end
CoreKG -- Uses --> Parser
CoreKG -- Uses --> Gitalisk
CoreKG -- Uses --> DBClient
DBClient -- Interacts --> DB
CLI -- Accesses --> CoreKG
LSP -- Accesses --> CoreKG
Server -- Indexer Accesses --> CoreKG
CLI -- May use --> Gitalisk
LSP -- May use --> Gitalisk
end
style CoreKG fill:#000,stroke:#333,stroke-width:2px
style Parser fill:#000,stroke:#333,stroke-width:2px
style Gitalisk fill:#000,stroke:#333,stroke-width:2px
```
### Database Querying
#### Server Side Querying
For querying, a service (either a Chat tool or other tools) in the Rails monolith will use a simple abstraction layer to talk directly to a graph node ("Knowledge Graph Core Crate" wouldn't be involved). Detailed flows are part of the [server-side knowledge graph design document](https://handbook.gitlab.com/handbook/engineering/architecture/design-documents/knowledge_graph/).
```mermaid
flowchart TD
subgraph GitLab Rails
T1[Duo Chat Tool] -->|Cypher query for project X| K(Knowledge graph layer)
T2[Other service] -->|Cypher query for project Y| K(Knowledge graph layer)
end
K --> |Cypher query for project X| GA1(zoekt-webservice)
K --> |Cypher query for project Y| GA2(zoekt-webservice)
subgraph zoekt node 1
GA1 -->K1[Kuzu DB X]
GA1 -->K2[Kuzu DB A]
GA1 -->K3[Kuzu DB B]
end
subgraph zoekt node 2
GA2 -->K21[Kuzu DB Y]
GA2 -->K22[Kuzu DB C]
GA2 -->K23[Kuzu DB D]
end
```
For indexing, the indexer (Knowledge Graph Core Indexer crate) will accept repository files as input, and it returns graph data in a universal format (CSV, JSON, or something else). The flow will look something like this:
```mermaid
flowchart TD
subgraph GitLab Rails
W[Repository indexing worker]
end
W --> |Index repo X| ZI(zoekt-indexer go app)
subgraph zoekt node 1
ZI --> |FFI index call: repo files, path to kuzu DB| KGIL[Knowledge graph indexer lib]
subgraph rust libraries
KGIL --> |calls parser|P[One Parser]
end
KGIL --> |Create or update DB|DB[Kuzu DB for repo X]
end
```
#### Client Side Querying
On the client side, we'll query directly through the Rust crate (perhaps spin up a local HTTP server as well, similar to how the LSP does it).
## Component Details
Combining a unified parser, efficient repository access, a structured graph representation, and flexible deployment options aims to provide a robust foundation for GitLab's next-generation code intelligence features.
> **Important Note:** We are not trying to replicate the full precision of every language’s compiler. Instead, the goal is a “good‑enough” level of accuracy—rich enough for AI features to reason over code structure, yet fast enough to index very large repositories in just a few seconds on a typical developer laptop.
For details on each one of these architectural components, please see the child epics below:
- https://gitlab.com/groups/gitlab-org/-/epics/17517+
- https://gitlab.com/groups/gitlab-org/-/epics/17518+
- https://gitlab.com/groups/gitlab-org/-/epics/17516+
- https://gitlab.com/groups/gitlab-org/-/epics/17515+
> **Note**: The [Kuzu](https://docs.kuzudb.com/get-started/) database component will be covered by both the https://gitlab.com/groups/gitlab-org/-/epics/17517+ and https://gitlab.com/groups/gitlab-org/-/epics/17518+. The client-side indexer (which https://gitlab.com/groups/gitlab-org/-/epics/17517+ encapsulates) will build and [statically link](https://docs.kuzudb.com/installation/#rust) Kuzu’s C++ library from source.
epic