example 2
Prompt:
gitlab-org&17514 What is this epic about? What issues are included in this epic? What was Jan's suggestion for handling knowledge graph incremental updates? Any recent discussions on this epic? Please provide an detailed report. Include a list of references and links at the bottom of the report in markdown format. Be sure to fetch comments.
Knowledge Graph First Iteration: Detailed Analysis Report
Executive Summary
Epic gitlab-org#17514 "Knowledge Graph First Iteration" outlines the development of a comprehensive system to create a structured, queryable representation of code repositories. This system aims to enhance developers' ability to visualize and understand how different components within a repository interconnect, providing crucial context for code analysis, AI features, and developer tools. The Knowledge Graph project employs modern technologies including Rust, tree-sitter, and the KuzuDB graph database to create an efficient, cross-platform solution.
Epic Overview
The Knowledge Graph project addresses a significant limitation in current AI data sources by providing comprehensive context for entire code repositories. It captures key entities such as files, directories, classes, functions, and their relationships to create a navigable graph representation of codebases.
Key Components
The epic is structured into four primary child epics, each addressing a critical aspect of the system:
- Unified Parser Project (gitlab-org#17516): A Rust-based parser leveraging tree-sitter and ast-grep for high-performance static analysis
- Client-side Repository Interaction (gitlab-org#17515): "Gitalisk" - A Rust library for efficient cross-platform git operations
- Knowledge Graph Core Project (gitlab-org#17517): Core indexing logic for extracting ASTs and defining graph structures
- Server Architecture (gitlab-org#17518 (closed)): Implementation with an indexer worker and API service
Technical Implementation
The Knowledge Graph system is built on a multi-layered architecture:
- Core Technology Stack: Rust for performance and cross-platform capabilities
- Database Technology: KuzuDB as an embeddable graph database
- Interface Options: CLI, Language Server Integration, and GitLab Server Integration
Included Issues
The epic includes specific issues focused on project implementation and security:
- gitlab-org/gitlab#540414: Project Security (gitalisk, gitlab-code-parser, knowledge-graph)
- gitlab-org/gitlab#536080 (closed): Publish Rust Crate
Jan's Suggestion for Knowledge Graph Incremental Updates
Jan Provaznik (@jprovaznik) proposed a comprehensive approach for handling knowledge graph incremental updates in a comment thread. His suggestion focuses on server-side architecture with a workflow that ensures consistency between client and server implementations.
Jan's approach involves:
-
Using a consistent indexer/parser across client and server sides, implemented as a Rust library with bindings for multiple languages
-
Handling file changes systematically:
- For deleted files: Delete all nodes belonging to the file (automatically removing relationships)
- For added files: Create new nodes and establish relationships using Cypher queries
- For renamed files: Combination of deletion and addition operations
- For updated files: Similar to renames - remove old nodes and add updated ones
-
Server-side indexing process:
- GitLab Rails Worker calls the Rust Knowledge Graph indexer via language bindings
- The indexer processes repository files and outputs graph data in a universal format (CSV/JSON)
- Data is sent to a Graph API service for storage in KuzuDB databases
Jan illustrated this workflow with a mermaid diagram:
flowchart TD
subgraph GitLab Rails Worker
T1[Reindex Repository X] -->|Use Ruby-FFI to call Go code| K(Load Git repository X)-->|Call indexer| R(Rust Knowledge graph indexer)
end
T1 --> |Send CSV/JSON graph data for repo X| GA1(Graph API)
subgraph graph node 1
GA1 -->K1[Kuzu DB X]
GA1 -->K2[Kuzu DB A]
GA1 -->K3[Kuzu DB B]
end
For incremental updates, Jan proposed creating a copy of the KuzuDB database before making changes, then replacing the old database with the updated one when indexing is complete.
Recent Discussions and Progress
As of May 2025, significant progress has been made on multiple fronts:
-
Project Infrastructure:
- Creation of dedicated repositories:
gitlab-org/rust/knowledge-graph,gitlab-org/rust/gitlab-code-parser, andgitlab-org/rust/gitalisk - Establishment of CI/CD pipelines and development workflows
- Setup of modular Rust workspaces with well-defined crates
- Creation of dedicated repositories:
-
Parser Development:
- Ruby Parser draft implementation demonstrating the API and initial code structure
- Successful use of ast-grep for node extraction with experimental assignment tracing
- Ability to build fully qualified name identifiers using ast-grep + tree-sitter
-
Indexer Design:
- Decision to use Kuzu's bulk import feature from CSV/Parquet for initial indexing
- Creation of both CLI and API interfaces for accessing the knowledge graph
- Performance benchmarking showing promising results (e.g., extracting Ruby AST data from the GitLab Monolith in ~12 seconds)
-
Integration Approach:
- Server-side: Exploration of reusing Zoekt's infrastructure for database creation and querying
- API Design: Synchronization across Knowledge Graph and One Parser teams to ensure consistent interfaces
- Discussions on appropriate binding strategies for cross-language usage (FFI, NAPI)
Recent sync-up meetings (April 30 - May 2, 2025) have been crucial for aligning implementation approaches across teams, with key decisions made on architecture, API contracts, and development priorities.
Next Steps
The project team has outlined several immediate priorities:
- Complete the Ruby parser implementation and structure the gitlab-code-parser project using idiomatic Rust
- Implement Go integration using FFI and benchmark different performance approaches
- Define API function signatures and binding locations
- Continue direct development on target projects rather than extending proofs of concept
- Refine the incremental update strategy for the Knowledge Graph
References
- Knowledge Graph First Iteration Epic
- Knowledge Graph Core Project Epic
- One Parser (gitlab-code-parser) Epic
- Client-side Repository Interaction Epic
- Server Architecture Epic
- Knowledge Graph Design Document
- Create Knowledge Graph Project Issue
- Publish Rust Crate Issue
- One Parser Proposal Issue
- Knowledge Graph Indexer Sync Recording
- One Parser API Design Sync Recording
- Repository: Knowledge Graph
- Repository: GitLab Code Parser
- Repository: Gitalisk