Commit d8a6afc1 authored by Nitin Kumar Garg's avatar Nitin Kumar Garg
Browse files

Remove trailing spaces

parent 4524d804
Loading
Loading
Loading
Loading
+10 −10
Original line number Diff line number Diff line
@@ -169,7 +169,7 @@ Cypher allows server-side execution with optimized graph planners, leveraging ad

#### 4. Schema Flexibility and Evolution are Essential

Customers will eventually need to be able to add their own data to the graph. Additionally, the Knowledge Graph’s schema must evolve rapidly as new GitLab SDLC entities (e.g., vulnerabilities, packages, runners) appear. 
Customers will eventually need to be able to add their own data to the graph. Additionally, the Knowledge Graph’s schema must evolve rapidly as new GitLab SDLC entities (e.g., vulnerabilities, packages, runners) appear.

GraphQL schemas require explicit type registration and backfilling, creating friction for iteration.
Cypher, being label-based, allows us to introduce new node or relationship labels without altering existing queries—`MATCH (n:Vulnerability)` returns zero rows until those labels exist. We can also use this to add custom data types in the future—something customers have shown strong interest in.
@@ -311,25 +311,25 @@ flowchart TD

```

### Database & Database Ops 
### Database & Database Ops

As a first iteration, the team aims to build **a [Graph Query Engine](/handbook/engineering/architecture/design-documents/gitlab_knowledge_graph/querying/graph_engine/) on ClickHouse** that translates basic Cypher (aka GQL) queries into SQL-compatible multi-hop graph traversals.

In October 2025, KuzuDB [was archived](https://www.theregister.com/2025/10/14/kuzudb_abandoned/) by maintainers. The Knowledge Graph Team spent time validating various database options against both Code Indexing and SDLC indexing, using the SLDC [dataset generator](https://gitlab.com/gitlab-org/rust/knowledge-graph/-/merge_requests/292) and pre-existing Code Index parquet files ([Database Selection Epic](https://gitlab.com/groups/gitlab-org/rust/-/epics/31)). We explored both new databases (Neo4J, FalkorBD, Memgraph, etc.) and already-deployed, approved GitLab databases (PostgreSQL and ClickHouse). 
In October 2025, KuzuDB [was archived](https://www.theregister.com/2025/10/14/kuzudb_abandoned/) by maintainers. The Knowledge Graph Team spent time validating various database options against both Code Indexing and SDLC indexing, using the SLDC [dataset generator](https://gitlab.com/gitlab-org/rust/knowledge-graph/-/merge_requests/292) and pre-existing Code Index parquet files ([Database Selection Epic](https://gitlab.com/groups/gitlab-org/rust/-/epics/31)). We explored both new databases (Neo4J, FalkorBD, Memgraph, etc.) and already-deployed, approved GitLab databases (PostgreSQL and ClickHouse).

Inspired by [Brahmand](https://www.brahmanddb.com/) and [SQL 2023’s Standardization of Property Graphs](https://www.iso.org/standard/79473.html) (ISO/IEC 9075-16:2023), the team created a modified version of the [Demo Instance](https://gitlab.com/gitlab-org/rust/knowledge-graph/-/issues/263) (which originally used Kuzu) and swapped it out with ClickHouse ([demo](https://gitlab.com/gitlab-org/rust/knowledge-graph/-/issues/268#note_2873427090), [code](https://gitlab.com/gitlab-org/rust/knowledge-graph/-/merge_requests/391)), proving that we can still get a functioning product with a ClickHouse/Postgres-backed Property Graph model. @andrewn also created a [Cypher to Postgres](https://gitlab.com/andrewn/opencypher-to-postgres#project-walkthrough) project that [passes 70%](https://gitlab.com/gitlab-com/gl-infra/sandbox/opencypher-to-postgres/-/merge_requests/20) of OpenCypher’s TCK suite, which much of the team can leverage.

Kùzu is a columnar system similar to modern read-optimized analytical DBMSs, like ClickHouse. The team conducted [research and benchmarking](https://gitlab.com/gitlab-org/rust/knowledge-graph/-/issues/267) against a ClickHouse and Postgres-backed Property Graph, which has alleviated our performance concerns. We achieved <300ms p95 query speeds for 3-hop traversals on a 20M+ row, 11GB dataset by leveraging CSR adjacency list index concepts from [KuzuDB’s whitepaper](https://www.cidrdb.org/cidr2023/papers/p48-jin.pdf). There is still much room for improvement, but the research so far gives us confidence in betting on ClickHouse. Postgres will be our backup, leveraging @andrewn work. 
Kùzu is a columnar system similar to modern read-optimized analytical DBMSs, like ClickHouse. The team conducted [research and benchmarking](https://gitlab.com/gitlab-org/rust/knowledge-graph/-/issues/267) against a ClickHouse and Postgres-backed Property Graph, which has alleviated our performance concerns. We achieved <300ms p95 query speeds for 3-hop traversals on a 20M+ row, 11GB dataset by leveraging CSR adjacency list index concepts from [KuzuDB’s whitepaper](https://www.cidrdb.org/cidr2023/papers/p48-jin.pdf). There is still much room for improvement, but the research so far gives us confidence in betting on ClickHouse. Postgres will be our backup, leveraging @andrewn work.

#### Why a Graph Query Engine on ClickHouse? 
#### Why a Graph Query Engine on ClickHouse?

- The **Data Model** (Property Graphs with arbitrary nodes and edges) is the most critical aspect of this product and enables the “Knowledge Graph” capabilities, irrespective of the underlying database.
- GitLab has significantly **more operational experience** with ClickHouse and Postgres than with graph databases (Neo4j, FalkorDB). 
- By leveraging our existing stack, we have **one less database to deploy and maintain**, reducing SRE and DBRE costs. 
- GitLab has significantly **more operational experience** with ClickHouse and Postgres than with graph databases (Neo4j, FalkorDB).
- By leveraging our existing stack, we have **one less database to deploy and maintain**, reducing SRE and DBRE costs.
- More **engineering investment goes into ClickHouse** over building an ETL pipeline from ClickHouse -> Graph Database, meaning the GKG team can help with Siphon & NATS.
- **Faster Time to Market** with this query layer. 
- **Faster Time to Market** with this query layer.
- **Two-Way Door**: If we find that the Database does not suit our needs, we can still leverage the components we deploy (Siphon, NATS, ClickHouse) as the foundation for the data pipeline to a new graph database (e.g., Neo4j, Falkor, Memgraph)
- **Legal and Procurement Barriers**: Because of the unfriendly licenses, any new database will have to go through both legal and ZIP. See the legal section. 
- **Legal and Procurement Barriers**: Because of the unfriendly licenses, any new database will have to go through both legal and ZIP. See the legal section.

View the [Graph Query Engine](/handbook/engineering/architecture/design-documents/gitlab_knowledge_graph/querying/graph_engine/) design document for more details.

@@ -343,7 +343,7 @@ The team evaluated the following databases for our needs:
- FalkorDB (SSPL and EE license)
- Memgraph (BSL and EE license)

After meeting with legal and procurement teams, we found that to proceed with any of these databases, we will need to purchase an enterprise edition license from the database provider in addition to the engineering challenges they introduce. This would have a minimum 30-day negotiation and procurement cycle. 
After meeting with legal and procurement teams, we found that to proceed with any of these databases, we will need to purchase an enterprise edition license from the database provider in addition to the engineering challenges they introduce. This would have a minimum 30-day negotiation and procurement cycle.

#### Why not fork Kuzu?

+1 −1

File changed.

Contains only whitespace changes.

+4 −4

File changed.

Contains only whitespace changes.

+1 −1

File changed.

Contains only whitespace changes.

+1 −1

File changed.

Contains only whitespace changes.

+2 −2

File changed.

Contains only whitespace changes.

+2 −2

File changed.

Contains only whitespace changes.

+1 −1

File changed.

Contains only whitespace changes.

+1 −1

File changed.

Contains only whitespace changes.

+1 −1

File changed.

Contains only whitespace changes.

+1 −1

File changed.

Contains only whitespace changes.

+6 −6

File changed.

Contains only whitespace changes.

+1 −1

File changed.

Contains only whitespace changes.

+1 −1

File changed.

Contains only whitespace changes.

+3 −3

File changed.

Contains only whitespace changes.

+1 −1

File changed.

Contains only whitespace changes.

+1 −1

File changed.

Contains only whitespace changes.

+11 −11

File changed.

Contains only whitespace changes.

+5 −5

File changed.

Contains only whitespace changes.

+1 −1

File changed.

Contains only whitespace changes.

+12 −12

File changed.

Contains only whitespace changes.

+2 −2

File changed.

Contains only whitespace changes.

+1 −1

File changed.

Contains only whitespace changes.

+1 −1

File changed.

Contains only whitespace changes.

+18 −18

File changed.

Contains only whitespace changes.

+1 −1

File changed.

Contains only whitespace changes.

+4 −4

File changed.

Contains only whitespace changes.

+1 −1

File changed.

Contains only whitespace changes.

+2 −2

File changed.

Contains only whitespace changes.

+1 −1

File changed.

Contains only whitespace changes.

+4 −4

File changed.

Contains only whitespace changes.

+3 −3

File changed.

Contains only whitespace changes.

+1 −1

File changed.

Contains only whitespace changes.

+1 −1

File changed.

Contains only whitespace changes.

+6 −6

File changed.

Contains only whitespace changes.

+4 −4

File changed.

Contains only whitespace changes.

+2 −2

File changed.

Contains only whitespace changes.

+2 −2

File changed.

Contains only whitespace changes.

+1 −1

File changed.

Contains only whitespace changes.

+2 −2

File changed.

Contains only whitespace changes.

+2 −2

File changed.

Contains only whitespace changes.

+2 −2

File changed.

Contains only whitespace changes.

+3 −3

File changed.

Contains only whitespace changes.

+1 −1

File changed.

Contains only whitespace changes.

+2 −2

File changed.

Contains only whitespace changes.

+1 −1

File changed.

Contains only whitespace changes.

+3 −3

File changed.

Contains only whitespace changes.

+1 −1

File changed.

Contains only whitespace changes.

+1 −1

File changed.

Contains only whitespace changes.

+3 −3

File changed.

Contains only whitespace changes.

+1 −1

File changed.

Contains only whitespace changes.

+1 −1

File changed.

Contains only whitespace changes.

+2 −2

File changed.

Contains only whitespace changes.

+1 −1

File changed.

Contains only whitespace changes.

+1 −1

File changed.

Contains only whitespace changes.

+1 −1

File changed.

Contains only whitespace changes.

+1 −1

File changed.

Contains only whitespace changes.

+10 −10

File changed.

Contains only whitespace changes.

+1 −1

File changed.

Contains only whitespace changes.

+1 −1

File changed.

Contains only whitespace changes.

+1 −1

File changed.

Contains only whitespace changes.

+7 −7

File changed.

Contains only whitespace changes.

+8 −8

File changed.

Contains only whitespace changes.

+3 −3

File changed.

Contains only whitespace changes.

+2 −2

File changed.

Contains only whitespace changes.

+1 −1

File changed.

Contains only whitespace changes.

+11 −11

File changed.

Contains only whitespace changes.

+13 −13

File changed.

Contains only whitespace changes.

+1 −1

File changed.

Contains only whitespace changes.

+1 −1

File changed.

Contains only whitespace changes.

+2 −2

File changed.

Contains only whitespace changes.

+2 −2

File changed.

Contains only whitespace changes.

+1 −1

File changed.

Contains only whitespace changes.

+1 −1

File changed.

Contains only whitespace changes.

+1 −1

File changed.

Contains only whitespace changes.

+8 −8

File changed.

Contains only whitespace changes.

+3 −3

File changed.

Contains only whitespace changes.

+1 −1

File changed.

Contains only whitespace changes.

+1 −1

File changed.

Contains only whitespace changes.

+3 −3

File changed.

Contains only whitespace changes.

+1 −1

File changed.

Contains only whitespace changes.

+1 −1

File changed.

Contains only whitespace changes.

+3 −3

File changed.

Contains only whitespace changes.

+4 −4

File changed.

Contains only whitespace changes.

+1 −1

File changed.

Contains only whitespace changes.

+1 −1

File changed.

Contains only whitespace changes.

+2 −2

File changed.

Contains only whitespace changes.

+1 −1

File changed.

Contains only whitespace changes.

+1 −1

File changed.

Contains only whitespace changes.

+2 −2

File changed.

Contains only whitespace changes.

+1 −1

File changed.

Contains only whitespace changes.

+2 −2

File changed.

Contains only whitespace changes.

+1 −1

File changed.

Contains only whitespace changes.

+1 −1

File changed.

Contains only whitespace changes.

Loading