feat: gitlab sdlc graph performance analysis
This change introduces a performance analysis and database size estimation pipeline for a GitLab-like graph schema using the Kuzu graph database.
The pipeline is designed to enable analysis with large datasets by streaming data generation directly to Parquet files and loading those files into Kuzu.
Analysis
72M Nodes / 268M Relationships Test Run
We ran a test using a large dataset with 72M nodes and 268M relationships to understand how Kuzu can handle a large graph. The configuration for this test included 100,000 users, 60,000 projects, and a high number of issues (12M) and merge requests (60M).
Here is dataset configuration:
DatasetConfig {
num_users: 100_000,
num_groups: 2_000,
num_projects: 60_000,
num_issues_per_project: 200,
num_mrs_per_project: 1000,
num_epics_per_group: 100,
num_milestones_per_project: 4,
long_description_ratio: 0.05,
long_description_size_bytes: 1 * 512 * 1024,
}
Screen_Recording_2025-09-13_at_5.59.59_PM
Data Import Metrics
Metric | Value | Notes |
---|---|---|
Total Records | ~341 Million (72.6M nodes, 268.4M rels) | A very large graph, dominated by MRs and Issues. |
Parquet Export Time | ~32.4 minutes | Data generation is CPU-bound and streams at ~48.3 MB/s. |
Bulk Import Time | ~5.5 minutes | Kuzu's COPY FROM seems to be efficient, even with compression enabled. |
Import Throughput | ~219,562 records/sec | Memory utilization here never exceeded 1GB. |
Parquet Export Size | ~91.6 GB | The raw Parquet data size. This includes the long issue and MR descriptions. |
Final Database Size | ~23.5 GB | The on-disk size is ~25% of the raw Parquet data size (91.6 GB). |
Performance Observations
-
Ingestion is Fast: Kuzu's data import is fast, taking only a fraction of the total time. This is a good indicator for performance in the event we need to re-index a namespaced database.
-
Efficient Storage: The final database size is smaller than the Parquet files it was created from.
Full pipeline ingestion results
=== Phase: Streaming Export (Parquet) ===
[GitLabUser] Writing to gitlab_estimation_data/gitlab_users.parquet | rows: 100,000 | est size: 12.2 MB
[GitLabUser] 10% 10,140 / 100,000 | 1.2 MB of ~12.2 MB | 838239 rows/s, 101.9 MB/s | ETA 00:00
[GitLabUser] 20% 20,280 / 100,000 | 2.5 MB of ~12.2 MB | 722107 rows/s, 87.8 MB/s | ETA 00:00
[GitLabUser] 81% 81,120 / 100,000 | 9.9 MB of ~12.2 MB | 789128 rows/s, 96.0 MB/s | ETA 00:00
[GitLabUser] 91% 91,260 / 100,000 | 11.1 MB of ~12.2 MB | 797528 rows/s, 97.0 MB/s | ETA 00:00
[GitLabUser] 100% 100,000 / 100,000 | 12.2 MB of ~12.2 MB | 807958 rows/s, 98.2 MB/s | ETA 00:00
[GitLabUser] Finalizing file...
[GitLabUser] 100% 100,000 / 100,000 | 4.6 MB of ~12.2 MB | 795627 rows/s, 36.4 MB/s | ETA 00:00
[GitLabUser] Done. Wrote 100,000 rows | file size 4.6 MB | gitlab_estimation_data/gitlab_users.parquet
[GitLabGroup] Writing to gitlab_estimation_data/gitlab_groups.parquet | rows: 2,000 | est size: 63.0 MB
[GitLabGroup] 100% 2,000 / 2,000 | 63.0 MB of ~63.0 MB | 43652 rows/s, 1374.3 MB/s | ETA 00:00
[GitLabGroup] Finalizing file...
[GitLabGroup] 100% 2,000 / 2,000 | 2.4 MB of ~63.0 MB | 40365 rows/s, 48.7 MB/s | ETA 00:00
[GitLabGroup] Done. Wrote 2,000 rows | file size 2.4 MB | gitlab_estimation_data/gitlab_groups.parquet
[GitLabProject] Writing to gitlab_estimation_data/gitlab_projects.parquet | rows: 60,000 | est size: 1.8 GB
[GitLabProject] 16% 10,140 / 60,000 | 319 MB of ~1.8 GB | 44176 rows/s, 1390.0 MB/s | ETA 00:01
[GitLabProject] 25% 15,210 / 60,000 | 479 MB of ~1.8 GB | 43878 rows/s, 1380.6 MB/s | ETA 00:01
[GitLabProject] 92% 55,770 / 60,000 | 64.0 MB of ~1.8 GB | 41617 rows/s, 47.8 MB/s | ETA 00:00
[GitLabProject] 100% 60,000 / 60,000 | 64.0 MB of ~1.8 GB | 41748 rows/s, 44.5 MB/s | ETA 00:00
[GitLabProject] Finalizing file...
[GitLabProject] 100% 60,000 / 60,000 | 74.3 MB of ~1.8 GB | 41229 rows/s, 51.0 MB/s | ETA 00:00
[GitLabProject] Done. Wrote 60,000 rows | file size 74.3 MB | gitlab_estimation_data/gitlab_projects.parquet
[GitLabMilestone] Writing to gitlab_estimation_data/gitlab_milestones.parquet | rows: 240,000 | est size: 7.4 GB
[GitLabMilestone] 10% 25,350 / 240,000 | 799 MB of ~7.4 GB | 42237 rows/s, 1331.6 MB/s | ETA 00:05
[GitLabMilestone] 21% 50,700 / 240,000 | 32.0 MB of ~7.4 GB | 42541 rows/s, 26.8 MB/s | ETA 00:04
[GitLabMilestone] 90% 218,010 / 240,000 | 256 MB of ~7.4 GB | 42777 rows/s, 50.2 MB/s | ETA 00:01
[GitLabMilestone] 100% 240,000 / 240,000 | 288 MB of ~7.4 GB | 42227 rows/s, 50.7 MB/s | ETA 00:00
[GitLabMilestone] Finalizing file...
[GitLabMilestone] 100% 240,000 / 240,000 | 292 MB of ~7.4 GB | 42159 rows/s, 51.3 MB/s | ETA 00:00
[GitLabMilestone] Done. Wrote 240,000 rows | file size 292 MB | gitlab_estimation_data/gitlab_milestones.parquet
[GitLabEpic] Writing to gitlab_estimation_data/gitlab_epics.parquet | rows: 200,000 | est size: 6.2 GB
[GitLabEpic] 10% 20,280 / 200,000 | 642 MB of ~6.2 GB | 43433 rows/s, 1374.3 MB/s | ETA 00:04
[GitLabEpic] 20% 40,560 / 200,000 | 32.0 MB of ~6.2 GB | 40147 rows/s, 31.7 MB/s | ETA 00:04
[GitLabEpic] 91% 182,520 / 200,000 | 224 MB of ~6.2 GB | 38261 rows/s, 47.0 MB/s | ETA 00:00
[GitLabEpic] 100% 200,000 / 200,000 | 224 MB of ~6.2 GB | 38087 rows/s, 42.7 MB/s | ETA 00:00
[GitLabEpic] Finalizing file...
[GitLabEpic] 100% 200,000 / 200,000 | 258 MB of ~6.2 GB | 37982 rows/s, 49.0 MB/s | ETA 00:00
[GitLabEpic] Done. Wrote 200,000 rows | file size 258 MB | gitlab_estimation_data/gitlab_epics.parquet
[GitLabIssue] Writing to gitlab_estimation_data/gitlab_issues.parquet | rows: 12,000,000 | est size: 370 GB
[GitLabIssue] 10% 1,201,590 / 12,000,000 | 1.5 GB of ~370 GB | 39880 rows/s, 49.9 MB/s | ETA 04:31
[GitLabIssue] 20% 2,403,180 / 12,000,000 | 2.9 GB of ~370 GB | 40303 rows/s, 50.4 MB/s | ETA 03:58
[GitLabIssue] 30% 3,604,770 / 12,000,000 | 4.4 GB of ~370 GB | 40364 rows/s, 50.9 MB/s | ETA 03:28
[GitLabIssue] 90% 10,804,170 / 12,000,000 | 13.3 GB of ~370 GB | 39729 rows/s, 50.1 MB/s | ETA 00:30
[GitLabIssue] 100% 12,000,000 / 12,000,000 | 14.8 GB of ~370 GB | 39659 rows/s, 50.0 MB/s | ETA 00:00
[GitLabIssue] Finalizing file...
[GitLabIssue] 100% 12,000,000 / 12,000,000 | 14.8 GB of ~370 GB | 39643 rows/s, 50.0 MB/s | ETA 00:00
[GitLabIssue] Done. Wrote 12,000,000 rows | file size 14.8 GB | gitlab_estimation_data/gitlab_issues.parquet
[GitLabMergeRequest] Writing to gitlab_estimation_data/gitlab_merge_requests.parquet | rows: 60,000,000 | est size: 1.8 TB
[GitLabMergeRequest] 10% 6,002,880 / 60,000,000 | 7.5 GB of ~1.8 TB | 39629 rows/s, 50.5 MB/s | ETA 22:43
[GitLabMergeRequest] 20% 12,000,690 / 60,000,000 | 14.9 GB of ~1.8 TB | 38686 rows/s, 49.3 MB/s | ETA 20:41
[GitLabMergeRequest] 30% 18,003,570 / 60,000,000 | 22.4 GB of ~1.8 TB | 38276 rows/s, 48.8 MB/s | ETA 18:17
[GitLabMergeRequest] 80% 48,002,760 / 60,000,000 | 59.7 GB of ~1.8 TB | 37319 rows/s, 47.6 MB/s | ETA 05:21
[GitLabMergeRequest] 90% 54,000,570 / 60,000,000 | 67.2 GB of ~1.8 TB | 37031 rows/s, 47.2 MB/s | ETA 02:42
[GitLabMergeRequest] 100% 60,000,000 / 60,000,000 | 74.7 GB of ~1.8 TB | 37074 rows/s, 47.3 MB/s | ETA 00:00
[GitLabMergeRequest] Finalizing file...
[GitLabMergeRequest] 100% 60,000,000 / 60,000,000 | 74.7 GB of ~1.8 TB | 37057 rows/s, 47.3 MB/s | ETA 00:00
[GitLabMergeRequest] Done. Wrote 60,000,000 rows | file size 74.7 GB | gitlab_estimation_data/gitlab_merge_requests.parquet
[OWNS_PROJECT (GitLabUser -> GitLabProject)] Writing to gitlab_estimation_data/gitlabuser_to_gitlabproject_relationships.parquet | rows: 60,000 | est size: 7.3 MB
[OWNS_PROJECT (GitLabUser -> GitLabProject)] 16% 10,140 / 60,000 | 1.2 MB of ~7.3 MB | 5659534 rows/s, 690.9 MB/s | ETA 00:00
[OWNS_PROJECT (GitLabUser -> GitLabProject)] 25% 15,210 / 60,000 | 1.9 MB of ~7.3 MB | 8231076 rows/s, 1004.8 MB/s
[OWNS_PROJECT (GitLabUser -> GitLabProject)] 100% 60,000 / 60,000 | 7.3 MB of ~7.3 MB | 19237706 rows/s, 2348.4 MB/s | ETA 00:00
[OWNS_PROJECT (GitLabUser -> GitLabProject)] 100% 60,000 / 60,000 | 473 KB of ~7.3 MB | 14427846 rows/s, 111.0 MB/s | ETA 00:00
[CONTAINS_PROJECT (GitLabGroup -> GitLabProject)] Writing to gitlab_estimation_data/gitlabgroup_to_gitlabproject_relationships.parquet | rows: 60,000 | est size: 7.3 MB
[CONTAINS_PROJECT (GitLabGroup -> GitLabProject)] 16% 10,140 / 60,000 | 1.2 MB of ~7.3 MB | 110018879 rows/s, 13430.0 MB/s | ETA 00:00
[CONTAINS_PROJECT (GitLabGroup -> GitLabProject)] 25% 15,210 / 60,000 | 1.9 MB of ~7.3 MB | 112771084 rows/s, 13766.0 MB/s | ETA 00:00
[CONTAINS_PROJECT (GitLabGroup -> GitLabProject)] 33% 20,280 / 60,000 | 2.5 MB of ~7.3 MB | 26843150 rows/s, 3276.8 MB/s | ETA 00:00
[CONTAINS_PROJECT (GitLabGroup -> GitLabProject)] 42% 25,350 / 60,000 | 3.1 MB of ~7.3 MB | 31613406 rows/s, 3859.1 MB/s | ETA 00:00,770 / 60,000 | 6.8 MB of ~7.3 MB | 38823529 rows/s, 4739.2 MB/s | ETA 00:00
[CONTAINS_PROJECT (GitLabGroup -> GitLabProject)] 100% 60,000 / 60,000 | 7.3 MB of ~7.3 MB | 41042030 rows/s, 5010.0 MB/s | ETA 00:00
[CONTAINS_PROJECT (GitLabGroup -> GitLabProject)] 100% 60,000 / 60,000 | 344 KB of ~7.3 MB | 27184181 rows/s, 152.1 MB/s | ETA 00:00
[BELONGS_TO_PROJECT (GitLabIssue -> GitLabProject)] Writing to gitlab_estimation_data/gitlabissue_to_gitlabproject_relationships.parquet | rows: 12,000,000 | est size: 1.4 GB
[BELONGS_TO_PROJECT (GitLabIssue -> GitLabProject)] 10% 1,201,590 / 12,000,000 | 147 MB of ~1.4 GB | 28529105 rows/s, 3482.6 MB/s | ETA 00:00
[BELONGS_TO_PROJECT (GitLabIssue -> GitLabProject)] 20% 2,403,180 / 12,000,000 | 293 MB of ~1.4 GB | 30550484 rows/s, 3729.3 MB/s | ETA 00:00
32.0 MB of ~1.4 GB | 38832783 rows/s, 115.0 MB/s | ETA 00:00
[BELONGS_TO_PROJECT (GitLabIssue -> GitLabProject)] 100% 12,000,000 / 12,000,000 | 32.0 MB of ~1.4 GB | 38695650 rows/s, 103.2 MB/s | ETA 00:00
[BELONGS_TO_PROJECT (GitLabIssue -> GitLabProject)] 100% 12,000,000 / 12,000,000 | 48.9 MB of ~1.4 GB | 35666098 rows/s, 145.4 MB/s | ETA 00:00
[BELONGS_TO_PROJECT (GitLabMergeRequest -> GitLabProject)] Writing to gitlab_estimation_data/gitlabmergerequest_to_gitlabproject_relationships.parquet | rows: 60,000,000 | est size: 7.2 GB
[BELONGS_TO_PROJECT (GitLabMergeRequest -> GitLabProject)] 10% 6,002,880 / 60,000,000 | 733 MB of ~7.2 GB | 47649342 rows/s, 5816.6 MB/s | ETA 00:01
[BELONGS_TO_PROJECT (GitLabMergeRequest -> GitLabProject)] 20% 12,000,690 / 60,000,000 | 32.0 MB of ~7.2 GB | 50854368 rows/s, 135.6 MB/s | ETA 00:01
[BELONGS_TO_PROJECT (GitLabMergeRequest -> GitLabProject)] 30% 18,003,570 / 60,000,000 | 64.0 MB of ~7.2 GB | 60851858 rows/s, 216.3 MB/s | ETA 00:01
[BELONGS_TO_PROJECT (GitLabMergeRequest -> GitLabProject)] 40% 24,001,380 / 60,000,000 | 96.0 MB of ~7.2 GB | ] 90% 54,000,570 / 60,000,000 | 192 MB of ~7.2 GB | 77780695 rows/s, 276.5 MB/s | ETA 00:00
[BELONGS_TO_PROJECT (GitLabMergeRequest -> GitLabProject)] 100% 60,000,000 / 60,000,000 | 224 MB of ~7.2 GB | 80325004 rows/s, 299.9 MB/s | ETA 00:00
[BELONGS_TO_PROJECT (GitLabMergeRequest -> GitLabProject)] 100% 60,000,000 / 60,000,000 | 243 MB of ~7.2 GB | 74063696 rows/s, 300.5 MB/s | ETA 00:00
[BELONGS_TO_PROJECT (GitLabMilestone -> GitLabProject)] Writing to gitlab_estimation_data/gitlabmilestone_to_gitlabproject_relationships.parquet | rows: 240,000 | est size: 29.3 MB
[BELONGS_TO_PROJECT (GitLabMilestone -> GitLabProject)] 10% 25,350 / 240,000 | 3.1 MB of ~29.3 MB | 12935601 rows/s, 1579.1 MB/s | ETA 00:00
[BELONGS_TO_PROJECT (GitLabMilestone -> GitLabProject)] 21% 50,700 / 240,000 | 6.2 MB of ~29.3 MB | 18445868 rows/s, 2251.7 MB/s | ETA 00:00
[BELONGS_TO_PROJECT (GitLabMilestone -> GitLabProject)] 31% 76,050 / 240,000 | 9.3 MB of ~29.3 MB | 22755835 rows/s, 2777.8 MB/s | ETA 00:00
[BELONGS_TO_PROJECT (GitLabMilestone -> GitLabProject)] 100% 240,000 / 240,000 | 1.3 MB of ~29.3 MB | 28848900 rows/s, 159.9 MB/s | ETA 00:00
[AUTHORED_BY (GitLabIssue -> GitLabUser)] Writing to gitlab_estimation_data/gitlabissue_to_gitlabuser_relationships.parquet | rows: 12,000,000 | est size: 1.4 GB
[AUTHORED_BY (GitLabIssue -> GitLabUser)] 10% 1,201,590 / 12,000,000 | 147 MB of ~1.4 GB | 28908581 rows/s, 3528.9 MB/s | ETA 00:00
[AUTHORED_BY (GitLabIssue -> GitLabUser)] 20% 2,403,180 / 12,000,000 | 293 MB of ~1.4 GB | 26505710 rows/s, 3235.6 MB/s | ETA 00:00
[AUTHORED_BY (GitLabIssue -> GitLabUser)] 30% 3,604,770 / 12,000,000 | 440 MB of ~1.4 GB | 27718259 rows/s, 3383.6
[AUTHORED_BY (GitLabIssue -> GitLabUser)] 90% 10,804,170 / 12,000,000 | 64.0 MB of ~1.4 GB | 29096309 rows/s, 172.3 MB/s | ETA 00:00
[AUTHORED_BY (GitLabIssue -> GitLabUser)] 100% 12,000,000 / 12,000,000 | 64.0 MB of ~1.4 GB | 29954600 rows/s, 159.7 MB/s | ETA 00:00
[AUTHORED_BY (GitLabIssue -> GitLabUser)] 100% 12,000,000 / 12,000,000 | 92.2 MB of ~1.4 GB | 28675739 rows/s, 220.4 MB/s | ETA 00:00
[AUTHORED_BY (GitLabMergeRequest -> GitLabUser)] Writing to gitlab_estimation_data/gitlabmergerequest_to_gitlabuser_relationships.parquet | rows: 60,000,000 | est size: 7.2 GB
[AUTHORED_BY (GitLabMergeRequest -> GitLabUser)] 10% 6,002,880 / 60,000,000 | 32.0 MB of ~7.2 GB | 38047343 rows/s, 202.8 MB/s | ETA 00:01
[AUTHORED_BY (GitLabMergeRequest -> GitLabUser)] 20% 12,000,690 / 60,000,000 | 64.0 MB of ~7.2 GB | 39307110 rows/s, 209.6 MB/s | ETA 00:01
[AUTHORED_BY (GitLabMergeRequest -> GitLabUser)] 100% 60,000,000 / 60,000,000 | 461 MB of ~7.2 GB | 43647283 rows/s, 335.6 MB/s | ETA 00:00
[AUTHORED_BY (GitLabEpic -> GitLabUser)] Writing to gitlab_estimation_data/gitlabepic_to_gitlabuser_relationships.parquet | rows: 200,000 | est size: 24.4 MB
[AUTHORED_BY (GitLabEpic -> GitLabUser)] 10% 20,280 / 200,000 | 2.5 MB of ~24.4 MB | 5078358 rows/s, 619.9 MB/s | ETA 00:00
[AUTHORED_BY (GitLabEpic -> GitLabUser)] 20% 40,560 / 200,000 | 5.0 MB of ~24.4 MB | 9001831 rows/s, 1098.9 MB/s | ETA 00:00
[AUTHORED_BY (GitLabEpic -> GitLabUser)] 30% 60,840 / 200,000 | 7.4 MB of ~24.4 MB | 12955823 rows/s, 1581.5 MB/s | ETA 00:00
[AUTHORED_BY (GitLabEpic -> GitLabUser)] 40% 81,120 / 200,000 | 9.9 MB of ~24.4 MB | 12665436 rows/s, 1546.1 MB/s | ETA 00:00
[AUTHORED_BY (GitLabEpic -> GitLabUser)] 100% 200,000 / 200,000 | 1.5 MB of ~24.4 MB | 19558707 rows/s, 150.4 MB/s | ETA 00:00
[ASSIGNED_TO (GitLabIssue -> GitLabUser)] Writing to gitlab_estimation_data/gitlabissue_to_gitlabuser_relationships.parquet | rows: 4,020,000 | est size: 491 MB
[ASSIGNED_TO (GitLabIssue -> GitLabUser)] 10% 405,600 / 4,020,000 | 49.5 MB of ~491 MB | 15567891 rows/s, 1900.4 MB/s | ETA 00:00
[ASSIGNED_TO (GitLabIssue -> GitLabUser)] 20% 806,130 / 4,020,000 | 98.4 MB of ~491 MB | 16601598 rows/s, 2026.6 MB/s | ETA 00:00
[ASSIGNED_TO (GitLabIssue -> GitLabUser)] 30% 1,206,660 / 4,020,000 | 147 MB of ~491 MB | 16079867 rows/s, 1962.9 MB/s | ETA 00:00
[ASSIGNED_TO (GitLabIssue -> GitLabUser)] 40% 1,612,260 / 4,020,000 | 197 MB of ~491 MB | 16220198 rows/s, 1980.0 MB/s | ETA 00:00
[ASSIGNED_TO (GitLabIssue -> GitLabUser)] 90% 3,619,980 / 4,020,000 | 442 MB of ~491 MB | 16663576 rows/s, 2034.1 MB/s | ETA 00:00
[ASSIGNED_TO (GitLabIssue -> GitLabUser)] 100% 4,020,000 / 4,020,000 | 491 MB of ~491 MB | 16880023 rows/s, 2060.5 MB/s | ETA 00:00
[ASSIGNED_TO (GitLabIssue -> GitLabUser)] 100% 4,020,000 / 4,020,000 | 30.9 MB of ~491 MB | 15845823 rows/s, 121.8 MB/s | ETA 00:00
[ASSIGNED_TO (GitLabMergeRequest -> GitLabUser)] Writing to gitlab_estimation_data/gitlabmergerequest_to_gitlabuser_relationships.parquet | rows: 30,000,000 | est size: 3.6 GB
[ASSIGNED_TO (GitLabMergeRequest -> GitLabUser)] 10% 3,001,440 / 30,000,000 | 366 MB of ~3.6 GB | 22746094 rows/s, 2776.6 MB/s | ETA 00:01
[ASSIGNED_TO (GitLabMergeRequest -> GitLabUser)] 20% 6,002,880 / 30,000,000 | 32.0 MB of ~3.6 GB | 19597319 rows/s, 104.5 MB/s | ETA 00:01
[ASSIGNED_TO (GitLabMergeRequest -> GitLabUser)] 30% 9,004,320 / 30,000,000 | 64.0 MB of ~3.6 GB | 17827589 rows/s, 126.7 MB/s | ETA 00:01
[ASSIGNED_TO (GitLabMergeRequest -> GitLabUser)] 100% 30,000,000 / 30,000,000 | 224 MB of ~3.6 GB | 20991349 rows/s, 156.7 MB/s | ETA 00:00
[ASSIGNED_TO (GitLabMergeRequest -> GitLabUser)] 100% 30,000,000 / 30,000,000 | 231 MB of ~3.6 GB | 20574176 rows/s, 158.2 MB/s | ETA 00:00
[IN_MILESTONE (GitLabIssue -> GitLabMilestone)] Writing to gitlab_estimation_data/gitlabissue_to_gitlabmilestone_relationships.parquet | rows: 3,000,000 | est size: 366 MB
[IN_MILESTONE (GitLabIssue -> GitLabMilestone)] 10% 304,200 / 3,000,000 | 37.1 MB of ~366 MB | 34691542 rows/s, 4234.8 MB/s | ETA 00:00
[IN_MILESTONE (GitLabIssue -> GitLabMilestone)] 20% 603,330 / 3,000,000 | 73.6 MB of ~366 MB | 36691652 rows/s, 4479.0 MB/s | ETA 00:00
[IN_MILESTONE (GitLabIssue -> GitLabMilestone)] 30% 902,460 / 3,000,000 | 110 MB of ~366 MB | 33868021 rows/s, 4134.3 MB/s | ETA 00:00
[IN_MILESTONE (GitLabIssue -> GitLabMilestone)] 90% 2,702,310 / 3,000,000 | 330 MB of ~366 MB | 33542436 rows/s, 4094.5 MB/s | ETA 00:00
[IN_MILESTONE (GitLabIssue -> GitLabMilestone)] 100% 3,000,000 / 3,000,000 | 366 MB of ~366 MB | 33878511 rows/s, 4135.6 MB/s | ETA 00:00
[IN_MILESTONE (GitLabIssue -> GitLabMilestone)] 100% 3,000,000 / 3,000,000 | 12.5 MB of ~366 MB | 31993729 rows/s, 133.0 MB/s | ETA 00:00
[IN_EPIC (GitLabIssue -> GitLabEpic)] Writing to gitlab_estimation_data/gitlabissue_to_gitlabepic_relationships.parquet | rows: 2,400,000 | est size: 293 MB
[IN_EPIC (GitLabIssue -> GitLabEpic)] 10% 243,360 / 2,400,000 | 29.7 MB of ~293 MB | 46323774 rows/s, 5654.8 MB/s | ETA 00:00
[IN_EPIC (GitLabIssue -> GitLabEpic)] 20% 481,650 / 2,400,000 | 58.8 MB of ~293 MB | 39815657 rows/s, 4860.3 MB/s | ETA 00:00
[IN_EPIC (GitLabIssue -> GitLabEpic)] 30% 725,010 / 2,400,000 | 88.5 MB of ~293 MB | 38465471 rows/s, 4695.5 MB/s | ETA 00:00
[IN_EPIC (GitLabIssue -> GitLabEpic)] 40% 963,300 / 2,400,000 | 118 MB of ~293 MB | 31513476 rows/s, 3846.9 MB/s | ETA 00:00
[IN_EPIC (GitLabIssue -> GitLabEpic)] 50% 1,201,590 / 2,400,000 | 147 MB of ~293 MB | 31083107 rows/s, 3794.3 MB/s | ETA 00:00
[IN_EPIC (GitLabIssue -> GitLabEpic)] 60% 1,444,950 / 2,400,000 | 176 MB of ~293 MB | 30799454 rows/s, 3759.7 MB/s | ETA 00:00
[IN_EPIC (GitLabIssue -> GitLabEpic)] 70% 1,683,240 / 2,400,000 | 205 MB of ~293 MB | 29560065 rows/s, 3608.4 MB/s | ETA 00:00
[IN_EPIC (GitLabIssue -> GitLabEpic)] 80% 1,921,530 / 2,400,000 | 235 MB of ~293 MB | 30679583 rows/s, 3745.1 MB/s | ETA 00:00
[IN_EPIC (GitLabIssue -> GitLabEpic)] 90% 2,164,890 / 2,400,000 | 264 MB of ~293 MB | 31666546 rows/s, 3865.5 MB/s | ETA 00:00
[IN_EPIC (GitLabIssue -> GitLabEpic)] 100% 2,400,000 / 2,400,000 | 293 MB of ~293 MB | 30596992 rows/s, 3735.0 MB/s | ETA 00:00
[IN_EPIC (GitLabIssue -> GitLabEpic)] 100% 2,400,000 / 2,400,000 | 14.3 MB of ~293 MB | 28029784 rows/s, 167.6 MB/s | ETA 00:00
[CLOSES_ISSUE (GitLabMergeRequest -> GitLabIssue)] Writing to gitlab_estimation_data/gitlabmergerequest_to_gitlabissue_relationships.parquet | rows: 12,000,000 | est size: 1.4 GB
[CLOSES_ISSUE (GitLabMergeRequest -> GitLabIssue)] 10% 1,201,590 / 12,000,000 | 147 MB of ~1.4 GB | 37754584 rows/s, 4608.7 MB/s | ETA 00:00
[CLOSES_ISSUE (GitLabMergeRequest -> GitLabIssue)] 20% 2,403,180 / 12,000,000 | 293 MB of ~1.4 GB | 37589087 rows/s, 4588.5 MB/s | ETA 00:00
[CLOSES_ISSUE (GitLabMergeRequest -> GitLabIssue)] 30% 3,604,770 / 12,000,000 | 440 MB of ~1.4 GB | 39434355 rows/s, 4813.8 MB/s | ETA 00:00
[CLOSES_ISSUE (GitLabMergeRequest -> GitLabIssue)] 40% 4,801,290 / 12,000,000 | 32.0 MB of ~1.4 GB | 37646202 rows/s, 250.9 MB/s | ETA 00:00
[CLOSES_ISSUE (GitLabMergeRequest -> GitLabIssue)] 50% 6,002,880 / 12,000,000 | 32.0 MB of ~1.4 GB | 37988754 rows/s, 202.5 MB/s | ETA 00:00
[CLOSES_ISSUE (GitLabMergeRequest -> GitLabIssue)] 60% 7,204,470 / 12,000,000 | 32.0 MB of ~1.4 GB | 38702264 rows/s, 171.9 MB/s | ETA 00:00
[CLOSES_ISSUE (GitLabMergeRequest -> GitLabIssue)] 70% 8,400,990 / 12,000,000 | 64.0 MB of ~1.4 GB | 37249239 rows/s, 283.7 MB/s | ETA 00:00
[CLOSES_ISSUE (GitLabMergeRequest -> GitLabIssue)] 80% 9,602,580 / 12,000,000 | 64.0 MB of ~1.4 GB | 37665608 rows/s, 251.0 MB/s | ETA 00:00
[CLOSES_ISSUE (GitLabMergeRequest -> GitLabIssue)] 90% 10,804,170 / 12,000,000 | 64.0 MB of ~1.4 GB | 36506684 rows/s, 216.2 MB/s | ETA 00:00
[CLOSES_ISSUE (GitLabMergeRequest -> GitLabIssue)] 100% 12,000,000 / 12,000,000 | 64.0 MB of ~1.4 GB | 36407394 rows/s, 194.1 MB/s | ETA 00:00
[CLOSES_ISSUE (GitLabMergeRequest -> GitLabIssue)] 100% 12,000,000 / 12,000,000 | 92.2 MB of ~1.4 GB | 34574423 rows/s, 265.8 MB/s | ETA 00:00
[BELONGS_TO_GROUP (GitLabEpic -> GitLabGroup)] Writing to gitlab_estimation_data/gitlabepic_to_gitlabgroup_relationships.parquet | rows: 200,000 | est size: 24.4 MB
[BELONGS_TO_GROUP (GitLabEpic -> GitLabGroup)] 10% 20,280 / 200,000 | 2.5 MB of ~24.4 MB | 11453848 rows/s, 1398.2 MB/s | ETA 00:00
[BELONGS_TO_GROUP (GitLabEpic -> GitLabGroup)] 20% 40,560 / 200,000 | 5.0 MB of ~24.4 MB | 15463947 rows/s, 1887.7 MB/s | ETA 00:00
[BELONGS_TO_GROUP (GitLabEpic -> GitLabGroup)] 30% 60,840 / 200,000 | 7.4 MB of ~24.4 MB | 19303047 rows/s, 2356.3 MB/s | ETA 00:00
[BELONGS_TO_GROUP (GitLabEpic -> GitLabGroup)] 40% 81,120 / 200,000 | 9.9 MB of ~24.4 MB | 22235568 rows/s, 2714.3 MB/s | ETA 00:00
[BELONGS_TO_GROUP (GitLabEpic -> GitLabGroup)] 50% 101,400 / 200,000 | 12.4 MB of ~24.4 MB | 24328215 rows/s, 2969.8 MB/s | ETA 00:00
[BELONGS_TO_GROUP (GitLabEpic -> GitLabGroup)] 60% 121,680 / 200,000 | 14.9 MB of ~24.4 MB | 26158364 rows/s, 3193.2 MB/s | ETA 00:00
[BELONGS_TO_GROUP (GitLabEpic -> GitLabGroup)] 70% 141,960 / 200,000 | 17.3 MB of ~24.4 MB | 27502967 rows/s, 3357.3 MB/s | ETA 00:00
[BELONGS_TO_GROUP (GitLabEpic -> GitLabGroup)] 81% 162,240 / 200,000 | 19.8 MB of ~24.4 MB | 28775947 rows/s, 3512.7 MB/s | ETA 00:00
[BELONGS_TO_GROUP (GitLabEpic -> GitLabGroup)] 91% 182,520 / 200,000 | 22.3 MB of ~24.4 MB | 29691523 rows/s, 3624.5 MB/s | ETA 00:00
[BELONGS_TO_GROUP (GitLabEpic -> GitLabGroup)] 100% 200,000 / 200,000 | 24.4 MB of ~24.4 MB | 31917439 rows/s, 3896.2 MB/s | ETA 00:00
[BELONGS_TO_GROUP (GitLabEpic -> GitLabGroup)] 100% 200,000 / 200,000 | 842 KB of ~24.4 MB | 28166201 rows/s, 115.8 MB/s | ETA 00:00
[MEMBER_OF_GROUP (GitLabUser -> GitLabGroup)] Writing to gitlab_estimation_data/gitlabuser_to_gitlabgroup_relationships.parquet | rows: 125,000 | est size: 15.3 MB
[MEMBER_OF_GROUP (GitLabUser -> GitLabGroup)] 12% 15,210 / 125,000 | 1.9 MB of ~15.3 MB | 17086681 rows/s, 2085.8 MB/s | ETA 00:00
[MEMBER_OF_GROUP (GitLabUser -> GitLabGroup)] 20% 25,350 / 125,000 | 3.1 MB of ~15.3 MB | 18678037 rows/s, 2280.0 MB/s | ETA 00:00
[MEMBER_OF_GROUP (GitLabUser -> GitLabGroup)] 32% 40,560 / 125,000 | 5.0 MB of ~15.3 MB | 12048268 rows/s, 1470.7 MB/s | ETA 00:00
[MEMBER_OF_GROUP (GitLabUser -> GitLabGroup)] 40% 50,700 / 125,000 | 6.2 MB of ~15.3 MB | 9636339 rows/s, 1176.3 MB/s | ETA 00:00
[MEMBER_OF_GROUP (GitLabUser -> GitLabGroup)] 52% 65,910 / 125,000 | 8.0 MB of ~15.3 MB | 12085630 rows/s, 1475.3 MB/s | ETA 00:00
[MEMBER_OF_GROUP (GitLabUser -> GitLabGroup)] 60% 76,050 / 125,000 | 9.3 MB of ~15.3 MB | 12340020 rows/s, 1506.4 MB/s | ETA 00:00
[MEMBER_OF_GROUP (GitLabUser -> GitLabGroup)] 73% 91,260 / 125,000 | 11.1 MB of ~15.3 MB | 13553381 rows/s, 1654.5 MB/s | ETA 00:00
[MEMBER_OF_GROUP (GitLabUser -> GitLabGroup)] 81% 101,400 / 125,000 | 12.4 MB of ~15.3 MB | 14785201 rows/s, 1804.8 MB/s | ETA 00:00
[MEMBER_OF_GROUP (GitLabUser -> GitLabGroup)] 93% 116,610 / 125,000 | 14.2 MB of ~15.3 MB | 15758728 rows/s, 1923.7 MB/s | ETA 00:00
[MEMBER_OF_GROUP (GitLabUser -> GitLabGroup)] 100% 125,000 / 125,000 | 15.3 MB of ~15.3 MB | 16720451 rows/s, 2041.1 MB/s | ETA 00:00
[MEMBER_OF_GROUP (GitLabUser -> GitLabGroup)] 100% 125,000 / 125,000 | 608 KB of ~15.3 MB | 14470314 rows/s, 68.7 MB/s | ETA 00:00
[MEMBER_OF_PROJECT (GitLabUser -> GitLabProject)] Writing to gitlab_estimation_data/gitlabuser_to_gitlabproject_relationships.parquet | rows: 120,000 | est size: 14.6 MB
[MEMBER_OF_PROJECT (GitLabUser -> GitLabProject)] 12% 15,210 / 120,000 | 1.9 MB of ~14.6 MB | 63507307 rows/s, 7752.4 MB/s | ETA 00:00
[MEMBER_OF_PROJECT (GitLabUser -> GitLabProject)] 21% 25,350 / 120,000 | 3.1 MB of ~14.6 MB | 12224479 rows/s, 1492.2 MB/s | ETA 00:00
[MEMBER_OF_PROJECT (GitLabUser -> GitLabProject)] 33% 40,560 / 120,000 | 5.0 MB of ~14.6 MB | 17010155 rows/s, 2076.4 MB/s | ETA 00:00
[MEMBER_OF_PROJECT (GitLabUser -> GitLabProject)] 42% 50,700 / 120,000 | 6.2 MB of ~14.6 MB | 20172079 rows/s, 2462.4 MB/s | ETA 00:00
[MEMBER_OF_PROJECT (GitLabUser -> GitLabProject)] 50% 60,840 / 120,000 | 7.4 MB of ~14.6 MB | 16389720 rows/s, 2000.7 MB/s | ETA 00:00
[MEMBER_OF_PROJECT (GitLabUser -> GitLabProject)] 63% 76,050 / 120,000 | 9.3 MB of ~14.6 MB | 16198804 rows/s, 1977.4 MB/s | ETA 00:00
[MEMBER_OF_PROJECT (GitLabUser -> GitLabProject)] 71% 86,190 / 120,000 | 10.5 MB of ~14.6 MB | 17763047 rows/s, 2168.3 MB/s | ETA 00:00
[MEMBER_OF_PROJECT (GitLabUser -> GitLabProject)] 80% 96,330 / 120,000 | 11.8 MB of ~14.6 MB | 19315738 rows/s, 2357.9 MB/s | ETA 00:00
[MEMBER_OF_PROJECT (GitLabUser -> GitLabProject)] 92% 111,540 / 120,000 | 13.6 MB of ~14.6 MB | 16594614 rows/s, 2025.7 MB/s | ETA 00:00
[MEMBER_OF_PROJECT (GitLabUser -> GitLabProject)] 100% 120,000 / 120,000 | 14.6 MB of ~14.6 MB | 17628805 rows/s, 2152.0 MB/s | ETA 00:00
[MEMBER_OF_PROJECT (GitLabUser -> GitLabProject)] 100% 120,000 / 120,000 | 827 KB of ~14.6 MB | 15841584 rows/s, 106.7 MB/s | ETA 00:00
[RELATED_TO_ISSUE (GitLabIssue -> GitLabIssue)] Writing to gitlab_estimation_data/gitlabissue_to_gitlabissue_relationships.parquet | rows: 12,000,000 | est size: 1.4 GB
[RELATED_TO_ISSUE (GitLabIssue -> GitLabIssue)] 10% 1,201,590 / 12,000,000 | 147 MB of ~1.4 GB | 30077535 rows/s, 3671.6 MB/s | ETA 00:00
[RELATED_TO_ISSUE (GitLabIssue -> GitLabIssue)] 20% 2,403,180 / 12,000,000 | 293 MB of ~1.4 GB | 33131716 rows/s, 4044.4 MB/s | ETA 00:00
[RELATED_TO_ISSUE (GitLabIssue -> GitLabIssue)] 80% 9,602,580 / 12,000,000 | 64.0 MB of ~1.4 GB | 31308043 rows/s, 208.6 MB/s | ETA 00:00
[RELATED_TO_ISSUE (GitLabIssue -> GitLabIssue)] 90% 10,804,170 / 12,000,000 | 64.0 MB of ~1.4 GB | 33769517 rows/s, 200.0 MB/s | ETA 00:00
[RELATED_TO_ISSUE (GitLabIssue -> GitLabIssue)] 100% 12,000,000 / 12,000,000 | 64.0 MB of ~1.4 GB | 34204653 rows/s, 182.4 MB/s | ETA 00:00
[RELATED_TO_ISSUE (GitLabIssue -> GitLabIssue)] 100% 12,000,000 / 12,000,000 | 92.2 MB of ~1.4 GB | 32119775 rows/s, 246.9 MB/s | ETA 00:00
[RELATED_TO_MERGE_REQUEST (GitLabMergeRequest -> GitLabMergeRequest)] Writing to gitlab_estimation_data/gitlabmergerequest_to_gitlabmergerequest_relationships.parquet | rows: 60,000,000 | est size: 7.2 GB
[RELATED_TO_MERGE_REQUEST (GitLabMergeRequest -> GitLabMergeRequest)] 10% 6,002,880 / 60,000,000 | 32.0 MB of ~7.2 GB | 43801203 rows/s, 233.5 MB/s | ETA 00:01
[RELATED_TO_MERGE_REQUEST (GitLabMergeRequest -> GitLabMergeRequest)] 20% 12,000,690 / 60,000,000 | 64.0 MB of ~7.2 GB | 40874731 rows/s, 217.9 MB/s | ETA 00:01
[RELATED_TO_MERGE_REQUEST (GitLabMergeRequest -> GitLabMergeRequest)] 90% 54,000,570 / 60,000,000 | 384 MB of ~7.2 GB | 49782116 rows/s, 353.9 MB/s | ETA 00:00
[RELATED_TO_MERGE_REQUEST (GitLabMergeRequest -> GitLabMergeRequest)] 100% 60,000,000 / 60,000,000 | 448 MB of ~7.2 GB | 49975732 rows/s, 373.1 MB/s | ETA 00:00
[RELATED_TO_MERGE_REQUEST (GitLabMergeRequest -> GitLabMergeRequest)] 100% 60,000,000 / 60,000,000 | 461 MB of ~7.2 GB | 47642374 rows/s, 366.3 MB/s | ETA 00:00
Streaming export complete to gitlab_estimation_data
=== Streaming Export Summary ===
Parquet export completed in 1941.2792955s
=== Phase: Schema Creation ===
[Schema][Node] Created GitLabProject in 12.902042ms
[Schema][Node] Created GitLabUser in 17.018333ms
[Schema][Node] Created GitLabIssue in 9.940042ms
[Schema][Node] Created GitLabMergeRequest in 9.031959ms
[Schema][Node] Created GitLabEpic in 8.938667ms
[Schema][Node] Created GitLabMilestone in 8.99825ms
[Schema][Node] Created GitLabGroup in 8.9985ms
[Schema][Rel ] Created OWNS_PROJECT in 10.082667ms
[Schema][Rel ] Created CONTAINS_PROJECT in 7.8875ms
[Schema][Rel ] Created BELONGS_TO_PROJECT in 8.074042ms
[Schema][Rel ] Created AUTHORED_BY in 8.92225ms
[Schema][Rel ] Created ASSIGNED_TO in 7.924042ms
[Schema][Rel ] Created IN_MILESTONE in 9.043875ms
[Schema][Rel ] Created IN_EPIC in 9.99375ms
[Schema][Rel ] Created CLOSES_ISSUE in 7.937625ms
[Schema][Rel ] Created BELONGS_TO_GROUP in 9.014417ms
[Schema][Rel ] Created MEMBER_OF_GROUP in 8.029292ms
[Schema][Rel ] Created MEMBER_OF_PROJECT in 7.921584ms
[Schema][Rel ] Created RELATED_TO_ISSUE in 8ms
[Schema][Rel ] Created RELATED_TO_MERGE_REQUEST in 8.122958ms
=== Phase: Bulk Import into Kuzu ===
[Import][Node] GitLabProject | 74.3 MB | 709.71875ms
[Import][Node] GitLabUser | 4.6 MB | 270.863917ms
[Import][Node] GitLabIssue | 14.8 GB | 51.669624417s
[Import][Node] GitLabMergeRequest | 74.7 GB | 241.606397667s
[Import][Node] GitLabEpic | 258 MB | 1.777740417s
[Import][Node] GitLabMilestone | 292 MB | 2.050748416s
[Import][Node] GitLabGroup | 2.4 MB | 190.951042ms
[Import][Rel ] OWNS_PROJECT (GitLabUser -> GitLabProject) | 827 KB | 69.832167ms
[Import][Rel ] CONTAINS_PROJECT (GitLabGroup -> GitLabProject) | 344 KB | 69.281584ms
[Import][Rel ] BELONGS_TO_PROJECT (GitLabIssue -> GitLabProject) | 48.9 MB | 1.4189395s
[Import][Rel ] BELONGS_TO_PROJECT (GitLabMergeRequest -> GitLabProject) | 243 MB | 6.903312583s
[Import][Rel ] BELONGS_TO_PROJECT (GitLabMilestone -> GitLabProject) | 1.3 MB | 89.078458ms
[Import][Rel ] AUTHORED_BY (GitLabIssue -> GitLabUser) | 30.9 MB | 662.613333ms
[Import][Rel ] AUTHORED_BY (GitLabMergeRequest -> GitLabUser) | 231 MB | 4.38984025s
[Import][Rel ] AUTHORED_BY (GitLabEpic -> GitLabUser) | 1.5 MB | 77.402166ms
[Import][Rel ] ASSIGNED_TO (GitLabIssue -> GitLabUser) | 30.9 MB | 670.163334ms
[Import][Rel ] ASSIGNED_TO (GitLabMergeRequest -> GitLabUser) | 231 MB | 4.445680875s
[Import][Rel ] IN_MILESTONE (GitLabIssue -> GitLabMilestone) | 12.5 MB | 437.48475ms
[Import][Rel ] IN_EPIC (GitLabIssue -> GitLabEpic) | 14.3 MB | 435.90675ms
[Import][Rel ] CLOSES_ISSUE (GitLabMergeRequest -> GitLabIssue) | 92.2 MB | 2.00126475s
[Import][Rel ] BELONGS_TO_GROUP (GitLabEpic -> GitLabGroup) | 842 KB | 80.418667ms
[Import][Rel ] MEMBER_OF_GROUP (GitLabUser -> GitLabGroup) | 608 KB | 65.392792ms
[Import][Rel ] MEMBER_OF_PROJECT (GitLabUser -> GitLabProject) | 827 KB | 65.2975ms
[Import][Rel ] RELATED_TO_ISSUE (GitLabIssue -> GitLabIssue) | 92.2 MB | 1.719923125s
[Import][Rel ] RELATED_TO_MERGE_REQUEST (GitLabMergeRequest -> GitLabMergeRequest) | 461 MB | 8.787665042s
--- Suite run 0 ---
=== GitLab Performance Test Results ===
Total Records: 72602000
Data Generation: 0ns (0 records/sec)
Parquet Export: 1941.2792955s
Schema Creation: 215.82675ms
Bulk Import: 330.666918542s (219562 records/sec)
Query Execution: 694.206875ms
=== Parquet Files: Nodes ===
GitLabProject: rows 60,000 | size 74.3 MB | ~1.3 KB/row | gitlab_estimation_data/gitlab_projects.parquet
GitLabUser: rows 100,000 | size 4.6 MB | ~47.0 B/row | gitlab_estimation_data/gitlab_users.parquet
GitLabIssue: rows 12,000,000 | size 14.8 GB | ~1.3 KB/row | gitlab_estimation_data/gitlab_issues.parquet
GitLabMergeRequest: rows 60,000,000 | size 74.7 GB | ~1.3 KB/row | gitlab_estimation_data/gitlab_merge_requests.parquet
GitLabEpic: rows 200,000 | size 258 MB | ~1.3 KB/row | gitlab_estimation_data/gitlab_epics.parquet
GitLabMilestone: rows 240,000 | size 292 MB | ~1.2 KB/row | gitlab_estimation_data/gitlab_milestones.parquet
GitLabGroup: rows 2,000 | size 2.4 MB | ~1.2 KB/row | gitlab_estimation_data/gitlab_groups.parquet
=== Parquet Files: Relationships ===
OWNS_PROJECT (GitLabUser -> GitLabProject): rows 60,000 | size 827 KB | ~14.0 B/row | gitlab_estimation_data/gitlabuser_to_gitlabproject_relationships.parquet
CONTAINS_PROJECT (GitLabGroup -> GitLabProject): rows 60,000 | size 344 KB | ~5.0 B/row | gitlab_estimation_data/gitlabgroup_to_gitlabproject_relationships.parquet
BELONGS_TO_PROJECT (GitLabIssue -> GitLabProject): rows 12,000,000 | size 48.9 MB | ~4.0 B/row | gitlab_estimation_data/gitlabissue_to_gitlabproject_relationships.parquet
BELONGS_TO_PROJECT (GitLabMergeRequest -> GitLabProject): rows 60,000,000 | size 243 MB | ~4.0 B/row | gitlab_estimation_data/gitlabmergerequest_to_gitlabproject_relationships.parquet
BELONGS_TO_PROJECT (GitLabMilestone -> GitLabProject): rows 240,000 | size 1.3 MB | ~5.0 B/row | gitlab_estimation_data/gitlabmilestone_to_gitlabproject_relationships.parquet
AUTHORED_BY (GitLabIssue -> GitLabUser): rows 12,000,000 | size 30.9 MB | ~2.0 B/row | gitlab_estimation_data/gitlabissue_to_gitlabuser_relationships.parquet
AUTHORED_BY (GitLabMergeRequest -> GitLabUser): rows 60,000,000 | size 231 MB | ~4.0 B/row | gitlab_estimation_data/gitlabmergerequest_to_gitlabuser_relationships.parquet
AUTHORED_BY (GitLabEpic -> GitLabUser): rows 200,000 | size 1.5 MB | ~8.0 B/row | gitlab_estimation_data/gitlabepic_to_gitlabuser_relationships.parquet
ASSIGNED_TO (GitLabIssue -> GitLabUser): rows 4,020,000 | size 30.9 MB | ~8.0 B/row | gitlab_estimation_data/gitlabissue_to_gitlabuser_relationships.parquet
ASSIGNED_TO (GitLabMergeRequest -> GitLabUser): rows 30,000,000 | size 231 MB | ~8.0 B/row | gitlab_estimation_data/gitlabmergerequest_to_gitlabuser_relationships.parquet
IN_MILESTONE (GitLabIssue -> GitLabMilestone): rows 3,000,000 | size 12.5 MB | ~4.0 B/row | gitlab_estimation_data/gitlabissue_to_gitlabmilestone_relationships.parquet
IN_EPIC (GitLabIssue -> GitLabEpic): rows 2,400,000 | size 14.3 MB | ~6.0 B/row | gitlab_estimation_data/gitlabissue_to_gitlabepic_relationships.parquet
CLOSES_ISSUE (GitLabMergeRequest -> GitLabIssue): rows 12,000,000 | size 92.2 MB | ~8.0 B/row | gitlab_estimation_data/gitlabmergerequest_to_gitlabissue_relationships.parquet
BELONGS_TO_GROUP (GitLabEpic -> GitLabGroup): rows 200,000 | size 842 KB | ~4.0 B/row | gitlab_estimation_data/gitlabepic_to_gitlabgroup_relationships.parquet
MEMBER_OF_GROUP (GitLabUser -> GitLabGroup): rows 125,000 | size 608 KB | ~4.0 B/row | gitlab_estimation_data/gitlabuser_to_gitlabgroup_relationships.parquet
MEMBER_OF_PROJECT (GitLabUser -> GitLabProject): rows 120,000 | size 827 KB | ~7.0 B/row | gitlab_estimation_data/gitlabuser_to_gitlabproject_relationships.parquet
RELATED_TO_ISSUE (GitLabIssue -> GitLabIssue): rows 12,000,000 | size 92.2 MB | ~8.0 B/row | gitlab_estimation_data/gitlabissue_to_gitlabissue_relationships.parquet
RELATED_TO_MERGE_REQUEST (GitLabMergeRequest -> GitLabMergeRequest): rows 60,000,000 | size 461 MB | ~8.0 B/row | gitlab_estimation_data/gitlabmergerequest_to_gitlabmergerequest_relationships.parquet
Single-Threaded Query Performance
Query Description | Execution Time |
---|---|
Count all users | ~10.12 ms |
Count all projects | ~0.58 ms |
Count all issues | ~1.34 ms |
Count all merge requests | ~4.88 ms |
Issues per project (top 10) | ~215.28 ms |
Most active users by issues authored (top 10) | ~245.89 ms |
Issues with assignees (count) | ~64.30 ms |
Merge requests that close issues (count) | ~63.34 ms |
Issues in milestones (top 5) | ~30.01 ms |
Epic to issues relationship (top 5) | ~58.35 ms |
-
Query Performance:
- Simple count queries seem to be under 100ms on average.
- More complex queries involving aggregations and joins typically execute in the 200-250ms range on average.
- This indicates strong performance for both simple lookups and more complex analytical queries.
Query Results
Parquet Totals: nodes 72,602,000 rows, rels 268,425,000 rows, size 91.6 GB (nodes 90.2 GB + rels 1.5 GB)
Export Throughput: 93809.6 MB in 1941.2792955s (48.3 MB/s)
Database Size: 23483.6 MB
=== Query Results ===
Count all users: 10.12175ms (1 rows)
Count all projects: 580.083µs (1 rows)
Count all issues: 1.338375ms (1 rows)
Count all merge requests: 4.877375ms (1 rows)
Issues per project: 215.28175ms (10 rows)
Most active users by issues authored: 245.893458ms (10 rows)
Issues with assignees: 64.297542ms (1 rows)
Merge requests that close issues: 63.339125ms (1 rows)
Issues in milestones: 30.014167ms (4 rows)
Epic to issues relationship: 58.350917ms (5 rows)
Total Test Time: 2272.856247667s
Concurrent Query Benchmark
The concurrent query benchmark provides a more detailed look at performance under load.
Query Description | Avg. Latency (ms) | p95 Latency (ms) | Notes |
---|---|---|---|
Issues per project (top 10) | 645.28 | 655.06 | High-level aggregation. |
Most active users by authored issues | 434.22 | 442.14 | Aggregation with ORDER BY . |
Assigned issues count | 135.67 | 139.41 | Simple relationship count. |
Merge requests closing issues | 157.36 | 159.62 | Simple relationship count. |
Issues in milestones (top 5) | 27.33 | 27.41 | Fast, targeted aggregation. |
Epic to issues relationship (top 5) | 44.21 | 44.27 | Fast, targeted aggregation. |
Project -> Issue -> Assignee chain | 440.48 | 440.71 | 3-hop traversal with property access. |
Issue neighbors within 2 hops | 3951.06 | 4058.66 | Variable-length path query, more expensive. |
Chain Query (Specific Fields) | 100.22 | 100.89 | Returns only small, specific string properties. |
Chain Query (Full Nodes) | 18177.66 | 18736.37 | Returns full node objects, including large descriptions. |
Full results
⋊> ~/g/k/knowledge-graph-worktree on size-test ◦ cargo run -p database --release --bin gitlab_estimation --features gitlab-estimation -- query gitlab_estimation.db_suite_0.db
db_path: gitlab_estimation.db_suite_0.db
Running query: Issues per project (top 10)
Running query: Issues per project (top 10), run: 0
Running query: Issues per project (top 10), run: 0
Running query: Issues per project (top 10), run: 0
Running query: Most active users by authored issues
Running query: Most active users by authored issues, run: 0
Running query: Most active users by authored issues, run: 0
Running query: Most active users by authored issues, run: 0
Running query: Assigned issues count
Running query: Assigned issues count, run: 0
Running query: Assigned issues count, run: 0
Running query: Assigned issues count, run: 0
Running query: Merge requests closing issues
Running query: Merge requests closing issues, run: 0
Running query: Merge requests closing issues, run: 0
Running query: Merge requests closing issues, run: 0
Running query: Issues in milestones (top 5)
Running query: Issues in milestones (top 5), run: 0
Running query: Issues in milestones (top 5), run: 0
Running query: Issues in milestones (top 5), run: 0
Running query: Epic to issues relationship (top 5)
Running query: Epic to issues relationship (top 5), run: 0
Running query: Epic to issues relationship (top 5), run: 0
Running query: Epic to issues relationship (top 5), run: 0
Running query: Project -> Issue -> Assignee chain
Running query: Project -> Issue -> Assignee chain, run: 0
Running query: Project -> Issue -> Assignee chain, run: 0
Running query: Project -> Issue -> Assignee chain, run: 0
Running query: Issue neighbors within 2 hops
Running query: Issue neighbors within 2 hops, run: 0
Running query: Issue neighbors within 2 hops, run: 0
Running query: Issue neighbors within 2 hops, run: 0
Running query: Issue -> Issue -> Assignee -> Group chain
Running query: Issue -> Issue -> Assignee -> Group chain, run: 0
Running query: Issue -> Issue -> Assignee -> Group chain, run: 0
Running query: Issue -> Issue -> Assignee -> Group chain, run: 0
=== Concurrent Query Benchmark Results ===
- Issues per project (top 10)
Query: MATCH (i:GitLabIssue)-[:BELONGS_TO_PROJECT]->(p:GitLabProject) RETURN p.name, count(i) as issue_count ORDER BY issue_count DESC LIMIT 10
Runs: 3 (ok 3, fail 0)
Rows: min 10, max 10
Latency (ms): min 583.26, p50 583.26, avg 587.14, p95 594.90, max 594.90
- Most active users by authored issues
Query: MATCH (i:GitLabIssue)-[:AUTHORED_BY]->(u:GitLabUser) RETURN u.username, count(i) as issues_authored ORDER BY issues_authored DESC LIMIT 10
Runs: 3 (ok 3, fail 0)
Rows: min 10, max 10
Latency (ms): min 470.77, p50 472.54, avg 473.82, p95 478.15, max 478.15
- Assigned issues count
Query: MATCH (i:GitLabIssue)-[:ASSIGNED_TO]->(u:GitLabUser) RETURN count(i) as assigned_issues_count
Runs: 3 (ok 3, fail 0)
Rows: min 1, max 1
Latency (ms): min 118.64, p50 118.67, avg 120.40, p95 123.89, max 123.89
- Merge requests closing issues
Query: MATCH (mr:GitLabMergeRequest)-[:CLOSES_ISSUE]->(i:GitLabIssue) RETURN count(mr) as mrs_closing_issues
Runs: 3 (ok 3, fail 0)
Rows: min 1, max 1
Latency (ms): min 146.74, p50 147.30, avg 147.52, p95 148.53, max 148.53
- Issues in milestones (top 5)
Query: MATCH (i:GitLabIssue)-[:IN_MILESTONE]->(m:GitLabMilestone) RETURN m.title, count(i) as issues_in_milestone ORDER BY issues_in_milestone DESC LIMIT 5
Runs: 3 (ok 3, fail 0)
Rows: min 4, max 4
Latency (ms): min 32.97, p50 33.01, avg 33.02, p95 33.09, max 33.09
- Epic to issues relationship (top 5)
Query: MATCH (i:GitLabIssue)-[:IN_EPIC]->(e:GitLabEpic) RETURN e.title, count(i) as issues_in_epic ORDER BY issues_in_epic DESC LIMIT 5
Runs: 3 (ok 3, fail 0)
Rows: min 5, max 5
Latency (ms): min 56.73, p50 56.79, avg 56.80, p95 56.89, max 56.89
- Project -> Issue -> Assignee chain
Query: MATCH (p:GitLabProject {id: 100})<-[btp:BELONGS_TO_PROJECT]-(i:GitLabIssue)-[at:ASSIGNED_TO]->(u:GitLabUser) RETURN p, btp, i, at, u
Runs: 3 (ok 3, fail 0)
Rows: min 67, max 67
Latency (ms): min 481.04, p50 481.06, avg 481.10, p95 481.21, max 481.21
- Issue neighbors within 2 hops
Query: MATCH (issue:GitLabIssue {id: 100})-[r*1..2]-(connectedNode) RETURN issue.id, issue.title, r, connectedNode.id LIMIT 100
Runs: 3 (ok 3, fail 0)
Rows: min 100, max 100
Latency (ms): min 3711.18, p50 3838.28, avg 3833.40, p95 3950.75, max 3950.75
- Issue -> Issue -> Assignee -> Group chain
Query: MATCH (issue:GitLabIssue {id: 1000})-[r1:RELATED_TO_ISSUE]-(issue2:GitLabIssue)-[r2:ASSIGNED_TO]-(user:GitLabUser)-[r3:MEMBER_OF_GROUP]-(g:GitlabGroup) RETURN issue.title, issue2.title, user.username, g.name
Runs: 3 (ok 3, fail 0)
Rows: min 3, max 3
Latency (ms): min 99.13, p50 100.65, avg 100.22, p95 100.89, max 100.89
⋊> ~/g/k/knowledge-graph-worktree on size-test ◦ 17:04:56
Observations
-
Impact of Large Properties: The specific selection of returned properties has a significant impact on latency. A query returning full node objects which includes a large
description
field seem to be slower than an identical query returning only specific, smaller fields (e.g.,title
,username
).-
Specific Fields (e.g.,
issue.title
): Average latency was ~100ms. -
Full Nodes (e.g.,
issue
): Average latency increased to ~18,200ms. - This highlights a consideration for optimal query speed: whe should think about the type of data stored in the database and only request the specific properties they need, avoiding the retrieval of large text fields unless absolutely necessary. It may make more sense to retrieve the full node from Rails upon returning results to the client. Note: this is with 0 optimizations in the code - we may gain more performance with these queries with higher memory limits and/or better query plan optimization.
-
Specific Fields (e.g.,
- Performant Neighbors Query: As shown in the video, the underlying query that Kuzu uses in its implementation of the "node click" functionality is retrieves neighbors in under 200ms on average.
How the Pipeline Works
The process is managed by a CLI and broken down into stages:
graph LR
subgraph "Input"
A[DatasetConfig]
end
subgraph "Pipeline Stages"
B(Data Generation & Streaming Export) -- Uses --> A;
B -- Generates --> C[Parquet Files];
C -- Imported into --> D(Kuzu Schema Init & Bulk Import);
D -- Populates --> E[Kuzu Database];
E -- Is queried by --> F(Query Benchmarking);
end
subgraph "Output"
F -- Produces --> G[Performance Report];
end
1. Data Generation and Export
The CLI generates synthetic data that mimics a GitLab instance and exports it to a set of Parquet files. Import things to note:
-
Configuration: The scale and characteristics of the generated data are controlled by the
DatasetConfig
struct. This allows for customization of the number of users, projects, groups, issues, merge requests, and more, as well as the size of text fields (such as descriptions). -
Streaming Export: To handle datasets that are too large to fit in memory, the
export_dataset_streaming
function is used. It generates data in chunks and writes each chunk directly to Parquet files. This ensures that memory usage remains bounded regardless of the total dataset size. -
Two-Pass Generation: The data is generated in two passes to create relationships without holding all node IDs in memory simultaneously:
-
Node Generation: All node tables (
GitLabUser
,GitLabProject
, etc.) are generated and written to their respective Parquet files. - Relationship Generation: A second pass iterates through the configured entity counts to generate relationship data, linking nodes based on deterministic rules. This relationship data is also streamed to Parquet files.
-
Node Generation: All node tables (
-
Parquet File Structure: The output is a directory containing multiple Parquet files. There is one file for each node type (e.g.,
gitlab_users.parquet
) and one file for each pair of connected node types in a relationship (e.g.,gitlabissue_to_gitlabuser_relationships.parquet
for bothAUTHORED_BY
andASSIGNED_TO
).
2. Schema Initialization
Before data can be imported, the database schema must be created in Kuzu.
-
Schema Definition: The graph schema is defined statically in
crates/database/src/gitlab_estimation/schema.rs
. It includes node tables (e.g.,GitLabProject
) with their properties and a primary key, as well as relationship tables (e.g.,OWNS_PROJECT
) that define connections between node tables. -
Table Creation: The
initialize_gitlab_schema
function connects to the Kuzu database, and for each node and relationship table defined in the schema, it executes aCREATE TABLE
query. This prepares the database to receive data.
3. Data Ingestion (Bulk Import)
With the schema in place and the data available in Parquet files, this stage loads the data into the database. The bulk_import_gitlab_data
function uses Kuzu's optimized COPY FROM
command to perform a bulk import from the Parquet files generated in the first stage.
Data Schema and Generation Details
Dataset Configuration
The DatasetConfig
struct provides fine-grained control over the size and shape of the generated dataset. Its fields include:
-
num_users
: The total number ofGitLabUser
nodes to generate. -
num_groups
: The total number ofGitLabGroup
nodes. -
num_projects
: The total number ofGitLabProject
nodes. -
num_issues_per_project
: The number ofGitLabIssue
nodes to generate for each project. -
num_mrs_per_project
: The number ofGitLabMergeRequest
nodes to generate for each project. -
num_epics_per_group
: The number ofGitLabEpic
nodes to generate for each group. -
num_milestones_per_project
: The number ofGitLabMilestone
nodes to generate for each project. -
long_description_ratio
: A float between0.0
and1.0
representing the fraction of nodes that should receive a long description. -
long_description_size_bytes
: The size, in bytes, of the long descriptions to be generated.
Node Types and Fields
The generated data populates a graph with the following node types, each with its own set of fields:
Node Type | Fields |
---|---|
GitLabUser |
id , username , name , email , created_at , is_admin , is_bot
|
GitLabGroup |
id , name , path , description , visibility , created_at , updated_at , projects_count , members_count
|
GitLabProject |
id , name , description , visibility , created_at , updated_at , stars_count , forks_count , issues_count , merge_requests_count
|
GitLabIssue |
id , iid , project_id , author_id , assignee_id , title , description , state , created_at , updated_at , closed_at , labels , weight , milestone_id , epic_id
|
GitLabMergeRequest |
id , iid , project_id , author_id , assignee_id , title , description , state , created_at , updated_at , merged_at , closed_at , source_branch , target_branch , labels , draft , changes_count , additions , deletions , commits_count
|
GitLabMilestone |
id , project_id , title , description , state , created_at , updated_at , due_date , start_date
|
GitLabEpic |
id , iid , group_id , author_id , title , description , state , created_at , updated_at , closed_at , start_date , due_date , labels
|
Relationship Generation
Relationships between nodes are not random. They are created using a set of deterministic rules to ensure that the graph is well-connected and that test runs are repeatable.
-
Deterministic Rules: The CLI uses modulo arithmetic based on the nodes' primary IDs. For example, a project's parent group is chosen by calculating
(project_id % number_of_groups) + 1
. This approach guarantees that each project is consistently assigned to the same group across different test runs with the same configuration. This principle is applied to create a variety of relationships, such as authors for issues, assignees for merge requests, and project ownership.
Description Field Generation
The description
fields in nodes like GitLabIssue
and GitLabProject
can be configured to test the database's handling of large text fields.
-
Standard Descriptions: By default, descriptions are short, realistic-looking sentences and paragraphs generated using the
fake
crate. -
Long Descriptions: The
DatasetConfig
allows for specifying along_description_ratio
andlong_description_size_bytes
. For a fraction of the nodes determined by the ratio, the generator will create a long string of a specified size (e.g., 512 KB). This is used to simulate issues or MRs with extensive descriptions and to analyze their impact on database size and query performance.
4. Query Benchmarking
Once the database is populated, its query performance is tested. Two types of benchmarks are available:
-
Single-Threaded Performance Queries: The
run_performance_queries
function executes a predefined set of Cypher queries against the database. These queries are designed to test various access patterns, including:- Simple node counts.
- Aggregations (e.g., counting issues per project).
- Joins across multiple relationships.
- Complex path traversals. The execution time and row count for each query are recorded.
-
Concurrent Query Benchmark: The
run_concurrent_query_benchmark
function is designed to test the database's performance under concurrent load. It runs a suite of queries in parallel across multiple threads and collects detailed latency statistics (min, max, p50, p95, average).
Command-Line Interface
A new binary, gitlab_estimation
, is provided to run the pipeline. It accepts several commands:
-
generate
: Only performs the data generation and export to Parquet files. -
quick
: Runs the full pipeline (generate, import, query) with a small, predefinedDatasetConfig
. -
comprehensive
: Runs the full pipeline with a largeDatasetConfig
. -
suite
: Runs the full pipeline with a series of differentDatasetConfig
s to test performance at various scales. -
query
: Runs only the concurrent query benchmark on an existing database.
Here are some example commands:
# Run the full benchmark suite
cargo run -p database --release --bin gitlab_estimation --features gitlab-estimation -- suite
# Run the concurrent query benchmark on a specific database file
cargo run -p database --release --bin gitlab_estimation --features gitlab-estimation -- query gitlab_estimation.db_suite_0.db