[kg] SPIKE: Contributions Pipeline and Contributer Success
Problem to Solve
The Knowledge Graph (KG) is envisioned not just as an internal tool, but as a foundational framework for code intelligence, similar to popular open-source projects. For the KG to achieve this vision and be widely adopted, we need to establish a clear and robust pipeline for contributions.
Currently, there is no defined pathway for new engineers, both within and outside of GitLab, to understand the system's architecture, get up to speed, and contribute effectively. This lack of a structured onboarding and contribution process creates a high barrier to entry, risks making the KG a "black box" understood by only a few core developers. It ultimately slows down its potential for growth and innovation. Without a deliberate focus on contributor success, we cannot effectively scale the addition of new language support or other extensions, limiting the overall utility and reach of the project.
Proposed Solution
We will treat the Knowledge Graph as a first-class open-source framework and build a comprehensive contributions pipeline designed to empower and upskill engineers to contribute. The solution is centered around three key pillars: extensibility, documentation, and community engagement.
1. Engineer an Extensible Framework
The core of the strategy is to provide clear, powerful extension points that allow developers to add significant value without needing to modify the indexer's core logic.
-
Adding New Languages: The indexer is aimed to be developed with language-agnostic architecture. A primary contribution path will be adding support for new programming languages. This involves creating a new parser using the established
gitlab-code-parser
pattern and bringing that parser into the indexer. - Build DSLs: Leverage something like @michaelusa's [proposal] Spike: Resolving Definition FQNs wit... (gitlab-code-parser#38)
-
Custom Extraction with Graph Extractor Language (GEL): We will introduce and document GEL, a custom DSL that allows developers to define their own rules for extracting framework-specific nodes and relationships from the AST. For example, a contributor could write a GEL file to identify all Next.js API routes and create custom
api_endpoint
nodes in the graph. This makes the KG an extensible framework that the community can adapt to their specific needs.
2. Build a Thorough Documentation Site
A dedicated, public-facing documentation site is critical for contributor success. This site will serve as the central hub for learning and contributing.
-
Robust Architectural Documentation: The site will feature a detailed architectural overview based on the
Knowledge Graph Core Indexer
design. It will explain the three-phase pipeline (File Discovery & Parsing, Resolution/Analysis, and Writing), the roles of the different Rust crates, the interaction with the Kuzu database, and the overall data flow. -
Contributor Guides & Tutorials: We can create "how-to" guides and tutorials that walk contributors through common tasks, such as:
- "How to Add a New Language to the Knowledge Graph"
- "How to Write Custom Rules with GEL"
-
API and Schema Reference: The documentation will include a full reference for the graph schema (nodes and relationships) and the API exposed by the
gkg
server.
3. Build a GitLab Team Member Pipeline
While we are moving fast, we want to enable other GitLab team members to contribute effectively. This includes setting up a robust Roadmap of "what's next" so that GitLab team members can pick up tasks that contribute to proceeding milestone deliverables. We can use a GitLab wiki or similar to outline all of the high-level efforts that align with the long-term vision of the project.
4. Foster a Contribution Ecosystem
To encourage community involvement, we will establish standard open-source community health files and processes.
-
CONTRIBUTING.md
: A detailedCONTRIBUTING.md
file will outline the development process, coding standards, and how to submit merge requests. - Open Source Strategy: We will finalize and publish our open-source strategy, clarifying which components of the Knowledge Graph are open source, open core, or closed source to provide transparency to the community.