example 2

Prompt:

gitlab-org&17514 What is this epic about? What issues are included in this epic? What was Jan's suggestion for handling knowledge graph incremental updates? Any recent discussions on this epic? Please provide an detailed report. Include a list of references and links at the bottom of the report in markdown format. Be sure to fetch comments.

Knowledge Graph First Iteration: Detailed Analysis Report

Executive Summary

Epic gitlab-org#17514 "Knowledge Graph First Iteration" outlines the development of a comprehensive system to create a structured, queryable representation of code repositories. This system aims to enhance developers' ability to visualize and understand how different components within a repository interconnect, providing crucial context for code analysis, AI features, and developer tools. The Knowledge Graph project employs modern technologies including Rust, tree-sitter, and the KuzuDB graph database to create an efficient, cross-platform solution.

Epic Overview

The Knowledge Graph project addresses a significant limitation in current AI data sources by providing comprehensive context for entire code repositories. It captures key entities such as files, directories, classes, functions, and their relationships to create a navigable graph representation of codebases.

Key Components

The epic is structured into four primary child epics, each addressing a critical aspect of the system:

  1. Unified Parser Project (gitlab-org#17516): A Rust-based parser leveraging tree-sitter and ast-grep for high-performance static analysis
  2. Client-side Repository Interaction (gitlab-org#17515): "Gitalisk" - A Rust library for efficient cross-platform git operations
  3. Knowledge Graph Core Project (gitlab-org#17517): Core indexing logic for extracting ASTs and defining graph structures
  4. Server Architecture (gitlab-org#17518 (closed)): Implementation with an indexer worker and API service

Technical Implementation

The Knowledge Graph system is built on a multi-layered architecture:

  • Core Technology Stack: Rust for performance and cross-platform capabilities
  • Database Technology: KuzuDB as an embeddable graph database
  • Interface Options: CLI, Language Server Integration, and GitLab Server Integration

Included Issues

The epic includes specific issues focused on project implementation and security:

Jan's Suggestion for Knowledge Graph Incremental Updates

Jan Provaznik (@jprovaznik) proposed a comprehensive approach for handling knowledge graph incremental updates in a comment thread. His suggestion focuses on server-side architecture with a workflow that ensures consistency between client and server implementations.

Jan's approach involves:

  1. Using a consistent indexer/parser across client and server sides, implemented as a Rust library with bindings for multiple languages

  2. Handling file changes systematically:

    • For deleted files: Delete all nodes belonging to the file (automatically removing relationships)
    • For added files: Create new nodes and establish relationships using Cypher queries
    • For renamed files: Combination of deletion and addition operations
    • For updated files: Similar to renames - remove old nodes and add updated ones
  3. Server-side indexing process:

    • GitLab Rails Worker calls the Rust Knowledge Graph indexer via language bindings
    • The indexer processes repository files and outputs graph data in a universal format (CSV/JSON)
    • Data is sent to a Graph API service for storage in KuzuDB databases

Jan illustrated this workflow with a mermaid diagram:

flowchart TD
    subgraph GitLab Rails Worker
    T1[Reindex Repository X] -->|Use Ruby-FFI to call Go code| K(Load Git repository X)-->|Call indexer| R(Rust Knowledge graph indexer)
    end
    T1 --> |Send CSV/JSON graph data for repo X| GA1(Graph API)
    subgraph graph node 1
    GA1 -->K1[Kuzu DB X]
    GA1 -->K2[Kuzu DB A]
    GA1 -->K3[Kuzu DB B]
    end

For incremental updates, Jan proposed creating a copy of the KuzuDB database before making changes, then replacing the old database with the updated one when indexing is complete.

Recent Discussions and Progress

As of May 2025, significant progress has been made on multiple fronts:

  1. Project Infrastructure:

    • Creation of dedicated repositories: gitlab-org/rust/knowledge-graph, gitlab-org/rust/gitlab-code-parser, and gitlab-org/rust/gitalisk
    • Establishment of CI/CD pipelines and development workflows
    • Setup of modular Rust workspaces with well-defined crates
  2. Parser Development:

    • Ruby Parser draft implementation demonstrating the API and initial code structure
    • Successful use of ast-grep for node extraction with experimental assignment tracing
    • Ability to build fully qualified name identifiers using ast-grep + tree-sitter
  3. Indexer Design:

    • Decision to use Kuzu's bulk import feature from CSV/Parquet for initial indexing
    • Creation of both CLI and API interfaces for accessing the knowledge graph
    • Performance benchmarking showing promising results (e.g., extracting Ruby AST data from the GitLab Monolith in ~12 seconds)
  4. Integration Approach:

    • Server-side: Exploration of reusing Zoekt's infrastructure for database creation and querying
    • API Design: Synchronization across Knowledge Graph and One Parser teams to ensure consistent interfaces
    • Discussions on appropriate binding strategies for cross-language usage (FFI, NAPI)

Recent sync-up meetings (April 30 - May 2, 2025) have been crucial for aligning implementation approaches across teams, with key decisions made on architecture, API contracts, and development priorities.

Next Steps

The project team has outlined several immediate priorities:

  1. Complete the Ruby parser implementation and structure the gitlab-code-parser project using idiomatic Rust
  2. Implement Go integration using FFI and benchmark different performance approaches
  3. Define API function signatures and binding locations
  4. Continue direct development on target projects rather than extending proofs of concept
  5. Refine the incremental update strategy for the Knowledge Graph

References

  1. Knowledge Graph First Iteration Epic
  2. Knowledge Graph Core Project Epic
  3. One Parser (gitlab-code-parser) Epic
  4. Client-side Repository Interaction Epic
  5. Server Architecture Epic
  6. Knowledge Graph Design Document
  7. Create Knowledge Graph Project Issue
  8. Publish Rust Crate Issue
  9. One Parser Proposal Issue
  10. Knowledge Graph Indexer Sync Recording
  11. One Parser API Design Sync Recording
  12. Repository: Knowledge Graph
  13. Repository: GitLab Code Parser
  14. Repository: Gitalisk