Orbit Production Cluster Perf Test — 2026-04-29 ($5986918) · Snippets · GitLab.org / orbit / GitLab Orbit

DDL Schema v20 — Audit Against Perf Test Findings

Audit: What's missing or misaligned with perf findings

1. `gl_edge` has no denormalized node properties — this is the !1095 (closed) gap

The edge table carries only structural columns (source_id, source_kind, relationship_kind, target_id, target_kind, traversal_path). Every cascade CTE that narrows by edge existence still requires a JOIN back to the node table to apply property filters like status = 'failed' or state = 'merged'.

What to add (high-value candidates based on our test corpus):

Edge relationship	Source kind	Denormalize from source	Why
`IN_PROJECT`	Pipeline	`status`, `source`	Q12 (TIMEOUT), Q24 (TIMEOUT), Q30 (15s), E1 (11.6s)
`IN_PROJECT`	Job	`status`, `failure_reason`	Q24 (TIMEOUT), I1 (17s)
`IN_PROJECT`	MergeRequest	`state`	Q4 (8.5s), F2 (9.8s), I5 (11.2s)
`IN_PROJECT`	Note	`noteable_type`	A1 (11.4s), I5 (11.2s)
`HAS_STAGE`	Pipeline (target: Stage)	`status` on Stage	A2 (TIMEOUT)
`HAS_LABEL`	WorkItem	`state`, `work_item_type`	I6 (9.8s)

These are LowCardinality(String) columns — adding them to gl_edge costs ~1 byte per row with LZ4. The edge table has projections ordered by (source_id, relationship_kind, ...) and (target_id, relationship_kind, ...), so a cascade query filtering WHERE relationship_kind = 'IN_PROJECT' AND target_id = 278964 AND source_status = 'failed' would be a tight range scan on the by_target projection.

2. `gl_code_edge` projection analysis

The code_edge table has by_source, by_target, by_rel_source_kind, by_rel_target_kind projections. The by_source projection ORDER BY (source_id, ...) should support e0.source_id IN (SELECT id FROM _nf_f), but the compiler emits this as a semi-join that ClickHouse may not optimize via projection lookup. The fix is compiler-side: rewrite the semi-join as a direct JOIN against the _nf_* CTE, or use an explicit key-lookup pattern that the projection can serve.

3. Missing composite projection on `gl_edge` for denormalized filters

If denormalized properties are added (point 1), a new projection is needed:

PROJECTION by_target_filtered (
  SELECT * ORDER BY (target_id, relationship_kind, source_kind, source_status, source_id, traversal_path)
)

This lets cascade CTEs filter target_id = project_id AND relationship_kind = 'IN_PROJECT' AND source_kind = 'Pipeline' AND source_status = 'failed' as a contiguous key range.

4. `gl_job` and `gl_stage` are missing parent foreign keys

Jobs link to Pipelines via gl_edge (HAS_JOB), but gl_job doesn't carry pipeline_id. This forces every Job→Pipeline query through a 2-table join (edge + node).

Same for gl_stage — no pipeline_id, no way to bypass the edge table.

Entities that already have good FK columns:

gl_note → has noteable_id + noteable_type ✓
gl_merge_request_diff → has merge_request_id ✓
gl_merge_request_diff_file → has merge_request_diff_id + merge_request_id ✓
gl_deployment → has project_id but no environment_id or pipeline_id

5. `gl_note` has no noteable clustering projection

Notes are the largest SDLC entity. Table is ordered by (traversal_path, id), so notes for the same MR/Issue are scattered by insert order. A projection would make HAS_NOTE traversals and "notes per MR" aggregations (I5, 11.2s) significantly faster:

PROJECTION by_noteable (
  SELECT * ORDER BY (traversal_path, noteable_type, noteable_id, id)
)

6. Code graph tables use `ReplacingMergeTree(_version)` without `_deleted`

-- gl_definition, gl_file, gl_directory, gl_imported_symbol:
ENGINE = ReplacingMergeTree(_version)  -- no _deleted!

All other entity tables use ReplacingMergeTree(_version, _deleted) with allow_experimental_replacing_merge_with_cleanup = 1. The code graph tables are the exception. This means deleted definitions/files/directories/imported_symbols are never garbage collected by background merges — they persist and the dedup scan processes them every time.

This partially explains why code graph queries are disproportionately slow. The effective table size is larger than it should be because dead rows accumulate indefinitely.

Fix: ENGINE = ReplacingMergeTree(_version, _deleted) for all four tables.

7. Edge tables lack `add_minmax_index_for_temporal_columns`

SDLC node tables (gl_pipeline, gl_merge_request, gl_job, gl_work_item, gl_note, gl_vulnerability, gl_deployment) have add_minmax_index_for_temporal_columns = 1, auto-creating minmax indexes on DateTime columns. The edge tables don't. Since _version is used in LIMIT 1 BY dedup, minmax on _version could help skip granules during dedup scans.

8. Index granularity: code graph node tables may benefit from 1024

Edge tables use index_granularity = 1024 (good — more granules = better skip index for projections). Code graph node tables (gl_definition, gl_file, etc.) use 2048. Since these are the worst-performing entity tables and rely heavily on bloom/text indexes for filtering, 1024 granularity would give those indexes 2x more skip opportunities.

Priority summary

Priority	Change	Tables	Queries fixed
P0	Add `_deleted` to RMT engine for code graph tables	`gl_definition`, `gl_file`, `gl_directory`, `gl_imported_symbol`	All Cluster 2 (code graph timeouts)
P0	Denormalize `status`/`state` onto `gl_edge` + new projection	`gl_edge`	Q12, Q24, Q30, A2, I1, I5, E1, F2, I6 (9 queries)
P1	Add `pipeline_id` to `gl_stage` + projection	`gl_stage`	A2 (TIMEOUT)
P1	Add `by_noteable` projection to `gl_note`	`gl_note`	I5 (11.2s), A1 (11.4s)
P2	Add `pipeline_id` to `gl_job` + projection	`gl_job`	Q24, I1
P2	Reduce `index_granularity` to 1024 for code graph nodes	`gl_definition`, `gl_file`	Cluster 2
P2	Add `add_minmax_index_for_temporal_columns` to edge tables	`gl_edge`, `gl_code_edge`	General dedup perf

Schema + Compiler Change Ideas — Ranked by Effort-to-Impact

Continuation of the schema audit above. These are concrete proposals ranked by bang-for-buck, with the DDL and compiler changes specified.

Tier 1: One-line DDL fixes, zero code changes

1A. Add `_deleted` to ReplacingMergeTree engine for code graph tables

Effort: 4 ALTER TABLE statements. No code changes. No migration. Impact: Fixes silent garbage accumulation on all code graph tables. Deleted definitions, files, directories, and imported symbols currently persist forever because background merges don't clean them up. This inflates every code graph dedup scan. Queries fixed: All Cluster 2 (Q1, Q21, Q26, Q32 timeouts; Q22, Q31, G1, G3 slow)

-- Current (broken):
ENGINE = ReplacingMergeTree(_version)

-- Fixed:
ALTER TABLE gl_definition MODIFY ENGINE = ReplacingMergeTree(_version, _deleted);
ALTER TABLE gl_file MODIFY ENGINE = ReplacingMergeTree(_version, _deleted);
ALTER TABLE gl_directory MODIFY ENGINE = ReplacingMergeTree(_version, _deleted);
ALTER TABLE gl_imported_symbol MODIFY ENGINE = ReplacingMergeTree(_version, _deleted);

Note: Can't ALTER ENGINE in ClickHouse — requires recreating the table or using MODIFY SETTING workarounds. Realistically this is a schema migration, but the change itself is trivial.

Tier 2: DDL-only projection additions, no compiler changes needed if ClickHouse auto-selects

2A. `by_project_state` projection on `gl_merge_request`

Effort: 1 ALTER + MATERIALIZE. No code changes if ClickHouse auto-selects. Impact: Collapses the cascade CTE + dedup scan into a single projection lookup for the most common query shape: "MRs in project X with state Y." The project_id column already exists on the table but isn't used. Queries fixed: Q4 (8.5s), F2 (9.8s), I5 (11.2s), Q6 (6.6s), E2 (5.9s)

ALTER TABLE gl_merge_request ADD PROJECTION by_project_state (
    SELECT * ORDER BY (traversal_path, project_id, state, id, _version)
);
ALTER TABLE gl_merge_request MATERIALIZE PROJECTION by_project_state;

The compiler currently emits _cascade_mr as an edge scan to find MR IDs in a project, then dedupes the full MR table filtered to those IDs. With this projection, a query like WHERE traversal_path IN (...) AND project_id = 278964 AND state = 'merged' ORDER BY id, _version DESC LIMIT 1 BY id would be a contiguous range scan.

2B. `by_noteable` projection on `gl_note`

Effort: 1 ALTER + MATERIALIZE. Impact: Notes for the same MR/Issue are currently scattered by insert order. This clusters them for HAS_NOTE traversals. Queries fixed: I5 (11.2s), A1 (11.4s)

ALTER TABLE gl_note ADD PROJECTION by_noteable (
    SELECT * ORDER BY (traversal_path, noteable_type, noteable_id, id, _version)
);
ALTER TABLE gl_note MATERIALIZE PROJECTION by_noteable;

2C. `by_pipeline` projection on `gl_job` (if `pipeline_id` is added)

Effort: ALTER to add column + projection + MATERIALIZE. Write-path needs to populate pipeline_id. Impact: Bypasses the edge table for Job→Pipeline traversals entirely. Queries fixed: Q24 (TIMEOUT), I1 (17s), Q30 (15s)

ALTER TABLE gl_job ADD COLUMN pipeline_id Nullable(Int64) CODEC(ZSTD(1));
ALTER TABLE gl_job ADD PROJECTION by_pipeline_status (
    SELECT * ORDER BY (traversal_path, pipeline_id, status, id, _version)
);
ALTER TABLE gl_job MATERIALIZE PROJECTION by_pipeline_status;

2D. `by_pipeline` projection on `gl_stage`

Same pattern as 2C. Queries fixed: A2 (TIMEOUT)

ALTER TABLE gl_stage ADD COLUMN pipeline_id Nullable(Int64) CODEC(ZSTD(1));
ALTER TABLE gl_stage ADD PROJECTION by_pipeline (
    SELECT * ORDER BY (traversal_path, pipeline_id, id, _version)
);
ALTER TABLE gl_stage MATERIALIZE PROJECTION by_pipeline;

Tier 3: DDL + compiler changes together

3A. Dedup-ids projection on dense tables

Effort: DDL projection on 5 tables + compiler change to emit ORDER BY id, _version DESC instead of ORDER BY traversal_path, id, _version DESC when the CTE is fed by a cascade ID set. Impact: Cuts dedup I/O by 25-100x for cascade-fed CTEs. Currently _nf_* CTEs do SELECT * FROM table ORDER BY traversal_path, id, _version DESC LIMIT 1 BY id, deserializing all columns (including large text blobs like description) just to produce an ID set. A lightweight projection reads only id, _version, _deleted.

Queries fixed: Every query that uses cascade CTEs — Q4, Q30, Q25, D1, D2, D3, E1, E2, F2, I1, I5, I6, and more.

-- On each of: gl_job, gl_pipeline, gl_note, gl_merge_request, gl_stage
ALTER TABLE gl_job ADD PROJECTION dedup_ids (
    SELECT id, _version, _deleted ORDER BY (id, _version DESC)
);
ALTER TABLE gl_job MATERIALIZE PROJECTION dedup_ids;

ALTER TABLE gl_pipeline ADD PROJECTION dedup_ids (
    SELECT id, _version, _deleted ORDER BY (id, _version DESC)
);
ALTER TABLE gl_pipeline MATERIALIZE PROJECTION dedup_ids;

ALTER TABLE gl_note ADD PROJECTION dedup_ids (
    SELECT id, _version, _deleted ORDER BY (id, _version DESC)
);
ALTER TABLE gl_note MATERIALIZE PROJECTION dedup_ids;

ALTER TABLE gl_merge_request ADD PROJECTION dedup_ids (
    SELECT id, _version, _deleted ORDER BY (id, _version DESC)
);
ALTER TABLE gl_merge_request MATERIALIZE PROJECTION dedup_ids;

ALTER TABLE gl_stage ADD PROJECTION dedup_ids (
    SELECT id, _version, _deleted ORDER BY (id, _version DESC)
);
ALTER TABLE gl_stage MATERIALIZE PROJECTION dedup_ids;

Compiler change: When the _nf_* CTE is fed by a _cascade_* ID set (i.e., WHERE id IN (SELECT id FROM _cascade_X)), emit the dedup as:

_nf_X AS (
    SELECT id FROM (
        SELECT id, _version, _deleted
        FROM gl_X
        WHERE id IN (SELECT id FROM _cascade_X)
        ORDER BY id, _version DESC
        LIMIT 1 BY id
    ) WHERE _deleted = false
)

This matches the dedup_ids projection key order (id, _version DESC) and reads only 3 columns. Today the compiler emits ORDER BY traversal_path, id, _version DESC which forces the main table ORDER BY and reads all columns.

Caveat: Property filters (state = 'merged', etc.) can't run on the dedup_ids projection since it doesn't have those columns. Two options:

Apply property filters in the main query JOIN (after the ID set is resolved). This is safe if the main query already re-filters.
For the most common filters, use the existing by_state_id / by_status_id projections instead. The compiler picks whichever projection matches the CTE's filter set.

3B. Materialize authorization scope once

Effort: Compiler change only, no DDL. Impact: Eliminates 5-10 copies of the 30-way startsWith(traversal_path, ...) OR block per query. Reduces query parse/plan time and lets ClickHouse evaluate the predicate once. Queries fixed: Every query (constant overhead reduction).

Compiler emits:

WITH _auth_paths AS (
    SELECT traversal_path FROM (
        SELECT '1/9970/' AS traversal_path
        UNION ALL SELECT '1/9970/106525819/'
        UNION ALL SELECT '1/9970/78934129/'
        -- ... all 30 paths
    )
)

Then each CTE and JOIN references it via:

WHERE EXISTS (
    SELECT 1 FROM _auth_paths AS ap
    WHERE startsWith(t.traversal_path, ap.traversal_path)
)

Or better, pass as a parameter:

-- Single parameter, referenced everywhere
WHERE arrayExists(p -> startsWith(t.traversal_path, p), {auth_paths:Array(String)})

This is pure compiler work. The 30-way OR is generated by the authorization layer and inlined into every subquery. Hoisting it into a CTE or parameter is a refactor of the SQL emitter.

3C. Edge denormalization (!1095 (closed))

Effort: Schema migration + write-path changes + new projection on gl_edge. Highest effort of all proposals. Impact: Highest absolute impact. Eliminates node table JOINs in cascade CTEs entirely for the denormalized properties. The cascade becomes a single-table filtered scan on gl_edge. Queries fixed: Q12, Q24, Q30, A2, I1, I5, E1, F2, I6 (9 queries, including 3 timeouts).

ALTER TABLE gl_edge ADD COLUMN source_state LowCardinality(Nullable(String)) CODEC(LZ4);
ALTER TABLE gl_edge ADD COLUMN source_status LowCardinality(Nullable(String)) CODEC(LZ4);
ALTER TABLE gl_edge ADD COLUMN source_type LowCardinality(Nullable(String)) CODEC(LZ4);

ALTER TABLE gl_edge ADD PROJECTION by_target_filtered (
    SELECT * ORDER BY (target_id, relationship_kind, source_kind, source_status, source_id, traversal_path, target_kind)
);
ALTER TABLE gl_edge MATERIALIZE PROJECTION by_target_filtered;

Write-path: when upserting an edge, look up the source node's state/status and write it to the edge row. When a node's state/status changes, update all connected edges. This is the write amplification cost.

Summary: Effort vs. Impact Matrix

#	Change	Effort	Scope	Timeouts fixed	Slow queries fixed
1A	`_deleted` on code graph RMT	Trivial (migration)	Code graph only	4-6 (Q1,Q21,Q26,Q32,G1)	3 (Q22,Q31,G3)
2A	`by_project_state` proj on MR	Low (DDL)	MR queries	0	5 (Q4,F2,I5,Q6,E2)
2B	`by_noteable` proj on Note	Low (DDL)	Note queries	0	2 (I5,A1)
2C	`pipeline_id` + proj on Job	Medium (DDL+write)	CI queries	1 (Q24)	2 (I1,Q30)
2D	`pipeline_id` + proj on Stage	Medium (DDL+write)	CI queries	1 (A2)	0
3A	`dedup_ids` proj + compiler	Medium (DDL+compiler)	All entities	0	All cascade queries
3B	Auth scope materialization	Medium (compiler)	All queries	0	All (constant reduction)
3C	Edge denormalization	High (schema+write+compiler)	SDLC edges	3 (Q12,Q24,A2)	6 (Q30,I1,I5,E1,F2,I6)

If I could only ship 3 things: 1A + 3A + 3C. That's the _deleted fix (stops the bleeding on code graph), dedup-ids projections (cuts I/O across the board), and edge denormalization (kills the cascade-to-node-table JOIN for the densest tables). Together they address every cluster from the perf test.

If I could only ship 1 thing: 3C (edge denormalization). It's the most effort but it's the only change that directly eliminates timeouts on production queries without requiring the user to change their query shape.

v0.37.1 Baseline — Before !1134 (merged) / !1135 (merged) / !1136 (merged)

These are the numbers on the currently deployed version (v0.37.1). The 3 MRs have merged but haven't rolled out yet. This is the "before" for comparison once v0.38.0 deploys.

Code graph queries (targeted by !1134 (merged): `_deleted` on RMT + granularity 1024)

Query	v0.37.1 (before)	Earlier run	Delta
Q1 DEFINES traversal	29198ms (barely survived!)	30274ms TIMEOUT	~same, got lucky
Q21 CALLS aggregation	30528ms TIMEOUT	30650ms TIMEOUT	same
Q26 CALLS token_match	30335ms TIMEOUT	30336ms TIMEOUT	same
Q32 EXTENDS	30373ms TIMEOUT	30258ms TIMEOUT	same
Q22 IMPORTS	10415ms	14533ms	28% faster (variance?)
Q31 ON_BRANCH	19564ms	21553ms	9% faster (variance?)
G1 3-hop code chain	30342ms TIMEOUT	30330ms TIMEOUT	same
G3 import resolve	23141ms	23601ms	same

Note: !1134 (merged)'s _deleted RMT fix and granularity change are DDL-only. The dead rows won't disappear until background merges run after deployment. Expect gradual improvement over hours/days post-deploy, not instant.

MR-in-project queries (targeted by !1134 (merged): `by_project_state` projection)

Query	v0.37.1 (before)	Earlier run	Delta
Q4 MR count per author	14093ms	8536ms	worse (variance/load)
F2 Milestone MR count	7944ms	9817ms	19% better
Q6 User MRs traversal	9247ms	6626ms	worse (variance)
E2 sum(lines) per user	5985ms	5913ms	same
I5 Note count per MR	9779ms	11210ms	13% better

Note: The by_project_state projection is DDL-only — it needs MATERIALIZE PROJECTION to populate. Until deploy + materialize, ClickHouse won't use it.

Note queries (targeted by !1134 (merged): `by_noteable` projection)

Query	v0.37.1 (before)	Earlier run	Delta
I5 Note count per MR	9779ms	11210ms	13% better (variance)
A1 Note count per project	10450ms	11450ms	9% better (variance)

Same caveat — projection not materialized yet.

CI chain queries (targeted by !1135 (merged): `pipeline_id` FK on Job/Stage)

Query	v0.37.1 (before)	Earlier run	Delta
Q24 Job->Pipeline->Project agg	30379ms TIMEOUT	30249ms TIMEOUT	same
I1 Job count by failure	24072ms	17282ms	worse (variance/load)
Q30 Pipeline->Job traversal	10841ms	15591ms	30% better?
A2 Stage count per Pipeline	30325ms TIMEOUT	30391ms TIMEOUT	same

Note: !1135 (merged) adds the pipeline_id column and projections, but the column won't be populated until the indexer backfills it post-deploy. Until then, pipeline_id is NULL for all existing rows.

Cascade-fed dedup queries (targeted by !1136 (merged): id-only sort key)

Query	v0.37.1 (before)	Earlier run	Delta
Q25 MR->Diff	4605ms	8606ms	46% faster
D1 4-hop traversal	13153ms	11261ms	worse (variance)
E1 avg(duration) Pipeline	14942ms	11642ms	worse (variance)
I6 Label usage	7663ms	9790ms	22% faster

Note: !1136 (merged) is a compiler change, not DDL. It won't take effect until v0.38.0 deploys to the webserver pods.

Summary

All numbers are consistent with none of these changes being live yet:

Code graph: same timeouts (dead rows still present, granularity unchanged)
MR/Note projections: not materialized
pipeline_id FK: column not populated
Dedup sort key: compiler not deployed

This is a clean baseline. Rerun after v0.38.0 rollout to measure actual impact.

v0.37.0 → v0.37.1 Before/After

Cluster upgraded from v0.37.0 to v0.37.1 during this session. Initial corpus ran on v0.37.0, reruns on v0.37.1. The v0.37.1 release includes !1116 (merged) (trie subsumption), !1124 (merged) (wildcard traversal pruning), !1125 (merged) (path-finding frontier pruning), !1126 (merged) (code-edge projections).

!1125 (merged) — Path finding frontier pruning: BIG wins

Query	v0.37.0	v0.37.1	Delta
Q7 path User→Project (node_ids, depth 3)	11842ms	7631ms	-36%
Q17 path User→User (all_shortest, depth 3)	7867ms	3759ms	-52%
Q23 path with filters + rel_types	6664ms	5724ms	-14%
H1 path Group→Group	6125ms	641ms	-90%

H1 is a 10x improvement. The frontier pruning is clearly effective when endpoints are in the same namespace subtree (both Groups under 1/), allowing aggressive early elimination.

!1116 (merged) — Trie subsumption: ~2x on non-bottlenecked queries

Query	v0.37.0	v0.37.1	Delta
Q10 multi-hop CONTAINS traversal	8010ms	3539ms	-56%
Q28 multi-hop CONTAINS aggregation	11713ms	5023ms	-57%
Q5 source-side MR count	2311ms	1175ms	-49%
Q12 group pipeline agg (TIMEOUT)	30270ms	30341ms	same — Pipeline scan dominates

The trie collapses redundant startsWith paths (e.g., 1/9970/ subsumes 1/9970/106525819/), cutting the OR clause count. ~2x improvement on queries where the traversal_path predicate was the bottleneck. No effect on queries where the dense table dedup scan dominates (Q12).

!1124 (merged) — Wildcard traversal pruning: solid on neighbors

Query	v0.37.0	v0.37.1	Delta
Q3 bidir neighbors Project	3056ms	1944ms	-36%
Q15 neighbors many rel_types	2486ms	2270ms	-9%
B2 bidir neighbors Group	3105ms	1624ms	-48%

!1126 (merged) — Code-edge projections: marginal

Query	v0.37.0	v0.37.1	Delta
Q1 DEFINES traversal	30274ms TIMEOUT	29198ms	barely survived
Q21 CALLS aggregation	30650ms TIMEOUT	30528ms TIMEOUT	same
Q26 CALLS token_match	30336ms TIMEOUT	30335ms TIMEOUT	same
Q32 EXTENDS	30258ms TIMEOUT	30373ms TIMEOUT	same
Q22 IMPORTS	14533ms	10415ms	-28%
Q31 ON_BRANCH	21553ms	19564ms	-9%
G1 3-hop code chain	30330ms TIMEOUT	30342ms TIMEOUT	same
G3 import resolve	23601ms	23141ms	same

The new projections on gl_code_edge helped the edge scan phase (IMPORTS improved 28%), but the _nf_* CTE dedup scan on gl_definition/gl_file is still the dominant cost. The projections can't help there — that needs the _deleted RMT fix (!1134 (merged)) and dedup-ids optimization (!1136 (merged)) from v0.38.0.

Summary

MR	Target	Result
!1125 (merged) path-finding pruning	Path queries	36-90% faster — H1 went from 6.1s to 641ms
!1116 (merged) trie subsumption	All queries (auth OR)	49-57% faster on non-bottlenecked queries
!1124 (merged) wildcard pruning	Neighbors	36-48% faster
!1126 (merged) code-edge projections	Code graph	28% on IMPORTS only, timeouts unchanged

The v0.37.1 changes hit the authorization overhead (trie) and graph traversal phases (path/neighbor pruning) effectively. The remaining timeouts are all in the dedup/node-table-scan phase, which is what v0.38.0 (!1134 (merged) + !1136 (merged)) targets.

v0.37.1 → v0.38.0 Before/After

Cluster now on v0.38.0. MRs live: !1134 (merged) (_deleted RMT, by_project_state, by_noteable, granularity 1024), !1135 (merged) (pipeline_id FK on Job/Stage), !1136 (merged) (cascade-fed dedup id-only sort key).

Important caveat: !1134 (merged) and !1135 (merged) are DDL changes. Projections need MATERIALIZE, _deleted cleanup needs background merges, pipeline_id needs indexer backfill. These are NOT instant — improvement will be gradual over hours/days. !1136 (merged) is a compiler change and should be immediate.

!1136 (merged) — Cascade-fed dedup (compiler: id-only sort key)

Query	v0.37.0	v0.37.1	v0.38.0	v0.37.1→v0.38.0
Q4 MR count per author	8536ms	14093ms	24495ms	❌ worse
Q25 MR→Diff	8606ms	4605ms	6013ms	~ variance
D1 4-hop User→MR→Pipe→Job	11261ms	13153ms	17393ms	❌ worse
E1 avg(duration) Pipeline	11642ms	14942ms	18078ms	❌ worse
E2 sum(lines) per user	5913ms	5985ms	12402ms	❌ worse
F2 Milestone MR count	9817ms	7944ms	5690ms	✅ -28%
I6 Label usage	9790ms	7663ms	4115ms	✅ -46%
Q6 User MRs traversal	6626ms	9247ms	4649ms	✅ -50%

Mixed results. F2, I6, Q6 improved significantly. Q4, D1, E1, E2 got worse — possibly load/warmup or the new sort key interacting badly with these specific query shapes.

!1135 (merged) — pipeline_id FK on Job/Stage

Query	v0.37.0	v0.37.1	v0.38.0	v0.37.1→v0.38.0
Q24 Job→Pipeline→Project agg	TIMEOUT	TIMEOUT	TIMEOUT	same
I1 Job count by failure	17282ms	24072ms	30242ms	❌ worse (near-timeout)
Q30 Pipeline→Job traversal	15591ms	10841ms	7220ms	✅ -33%
A2 Stage count per Pipeline	TIMEOUT	TIMEOUT	TIMEOUT	same

Q30 improved. Q24 and A2 still timeout — pipeline_id column likely not backfilled yet. I1 regressed to near-timeout.

!1134 (merged) — Note projections

Query	v0.37.0	v0.37.1	v0.38.0	v0.37.1→v0.38.0
I5 Note count per MR	11210ms	9779ms	8010ms	✅ -18%
A1 Note count per project	11450ms	10450ms	9102ms	✅ -13%

Modest improvement. Projection may not be fully materialized yet.

!1134 (merged) — Code graph (_deleted RMT + granularity 1024)

Query	v0.37.0	v0.37.1	v0.38.0	v0.37.1→v0.38.0
Q1 DEFINES	TIMEOUT	29198ms	27675ms	✅ surviving now
Q21 CALLS agg	TIMEOUT	TIMEOUT	TIMEOUT	same
Q26 CALLS token	TIMEOUT	TIMEOUT	TIMEOUT	same
Q32 EXTENDS	TIMEOUT	TIMEOUT	TIMEOUT	same
Q22 IMPORTS	14533ms	10415ms	9530ms	✅ -9%
Q31 ON_BRANCH	21553ms	19564ms	15083ms	✅ -23%
G1 3-hop chain	TIMEOUT	TIMEOUT	TIMEOUT	same
G3 import resolve	23601ms	23141ms	19677ms	✅ -15%

Code graph queries are trending down. Q31 dropped from 21.5s→15.1s across two releases. Q22 from 14.5s→9.5s. The _deleted RMT fix needs background merge time to fully take effect — dead rows are being cleaned up gradually. Expect continued improvement over the next few hours.

Full v0.37.0 → v0.38.0 trajectory

Query	v0.37.0	v0.38.0	Total delta
Q6 User MRs	6626ms	4649ms	-30%
F2 Milestone MRs	9817ms	5690ms	-42%
I6 Label usage	9790ms	4115ms	-58%
Q30 Pipe→Job	15591ms	7220ms	-54%
I5 Notes per MR	11210ms	8010ms	-29%
A1 Note count	11450ms	9102ms	-21%
Q22 IMPORTS	14533ms	9530ms	-34%
Q31 ON_BRANCH	21553ms	15083ms	-30%
G3 import resolve	23601ms	19677ms	-17%
Q1 DEFINES	TIMEOUT	27675ms	rescued from timeout

10 queries improved. 4 code graph queries still timeout (Q21, Q26, Q32, G1). Q24 and A2 still timeout (awaiting pipeline_id backfill). Some queries regressed (Q4, D1, E1, E2) — needs investigation, may be load-related or a regression in the new dedup sort key for certain shapes.

v0.38.0 Regression Investigation — Q4, D1, E1, E2

Reran all 4 suspected regressions 3x each on v0.38.0. Full reindex was done so all projections are materialized and pipeline_id is backfilled.

3-run results

Query	Run 1	Run 2	Run 3	Median	v0.37.0	Verdict
Q4 MR count per author	5387ms	6844ms	5197ms	5387ms	8536ms	✅ -37% improvement
D1 4-hop chain	12395ms	15145ms	12968ms	12968ms	11261ms	~ flat (variance)
E1 avg(duration)	14154ms	13970ms	14613ms	14154ms	11642ms	❓ regression (consistent)
E2 sum(lines)	4788ms	4972ms	9508ms	4972ms	5913ms	✅ -16% improvement

Q4 and E2: Not regressions

The first v0.38.0 run (24.5s for Q4, 12.4s for E2) was a cold-cache outlier. Median across 3 runs shows both improved vs v0.37.0. Q4 dropped from 8.5s → 5.4s, E2 from 5.9s → 5.0s.

D1: Nondeterministic, not a regression

D1 has no order_by clause. The outer LIMIT 10 picks whichever edge rows ClickHouse scans first, which depends on merge state. Running twice produced completely different result sets:

Run 1: Job IDs [13361001921, 13361022839, 13361038859, ...], 6 MRs, 6 Pipelines Run 2: Job IDs [13364120970, 13628566167, 13628566232, ...], 6 MRs, 6 Pipelines

Different MRs, different Pipelines, different Jobs. The row_count instability (29→15→21→23) is because row_count counts unique nodes in the response, which varies with which edges win the race.

The 12-17s timing spread is scan nondeterminism, not a regression. Adding order_by would stabilize both results and timing.

E1: Real regression, consistent +2.5s

E1 is a 2-node aggregation: Pipeline(success, created_at >= April) →IN_PROJECT→ Project(278964), avg+max+count. It was consistently 11.6s on v0.37.0 and is now consistently 14s on v0.38.0 (3 runs: 14154, 13970, 14613).

This is a 2-node query — the cascade-fed dedup change from !1136 (merged) shouldn't apply (no cascade CTE for 2-node). Possible causes:

Reindex increased data volume — more Pipeline rows indexed since v0.37.0 data was current. ~1 month of new pipelines could add significant scan volume.
Projection materialization overhead — the new projections on other tables (MR, Note, Job, Stage) added write-path work that affects shared ClickHouse resources. Unlikely to cause a steady-state read regression though.
by_status_id projection contention — Pipeline already has a by_status_id projection. The status = 'success' filter should use it. But if the reindex changed the projection's sort order or granule distribution, it could affect scan efficiency.

Recommend checking EXPLAIN PIPELINE on the E1 query to see which projection ClickHouse is selecting.

Corrected v0.37.0 → v0.38.0 table

Query	v0.37.0	v0.38.0 (median)	Delta
Q4 MR count per author	8536ms	5387ms	✅ -37%
D1 4-hop chain	11261ms	12968ms	~ flat (nondeterministic)
E1 avg(duration)	11642ms	14154ms	❌ +22% regression
E2 sum(lines)	5913ms	4972ms	✅ -16%

DDL Schema v20 — Audit Against Perf Test Findings

Audit: What's missing or misaligned with perf findings

1. gl_edge has no denormalized node properties — this is the !1095 (closed) gap

2. gl_code_edge projection analysis

3. Missing composite projection on gl_edge for denormalized filters

4. gl_job and gl_stage are missing parent foreign keys

5. gl_note has no noteable clustering projection

6. Code graph tables use ReplacingMergeTree(_version) without _deleted

7. Edge tables lack add_minmax_index_for_temporal_columns

8. Index granularity: code graph node tables may benefit from 1024

Priority summary

Schema + Compiler Change Ideas — Ranked by Effort-to-Impact

Tier 1: One-line DDL fixes, zero code changes

1A. Add _deleted to ReplacingMergeTree engine for code graph tables

Tier 2: DDL-only projection additions, no compiler changes needed if ClickHouse auto-selects

2A. by_project_state projection on gl_merge_request

2B. by_noteable projection on gl_note

2C. by_pipeline projection on gl_job (if pipeline_id is added)

2D. by_pipeline projection on gl_stage

Tier 3: DDL + compiler changes together

3A. Dedup-ids projection on dense tables

3B. Materialize authorization scope once

3C. Edge denormalization (!1095 (closed))

Summary: Effort vs. Impact Matrix

v0.37.1 Baseline — Before !1134 (merged) / !1135 (merged) / !1136 (merged)

Code graph queries (targeted by !1134 (merged): _deleted on RMT + granularity 1024)

MR-in-project queries (targeted by !1134 (merged): by_project_state projection)

Note queries (targeted by !1134 (merged): by_noteable projection)

CI chain queries (targeted by !1135 (merged): pipeline_id FK on Job/Stage)

Cascade-fed dedup queries (targeted by !1136 (merged): id-only sort key)

Summary

v0.37.0 → v0.37.1 Before/After

!1125 (merged) — Path finding frontier pruning: BIG wins

!1116 (merged) — Trie subsumption: ~2x on non-bottlenecked queries

!1124 (merged) — Wildcard traversal pruning: solid on neighbors

!1126 (merged) — Code-edge projections: marginal

Summary

v0.37.1 → v0.38.0 Before/After

!1136 (merged) — Cascade-fed dedup (compiler: id-only sort key)

!1135 (merged) — pipeline_id FK on Job/Stage

!1134 (merged) — Note projections

!1134 (merged) — Code graph (_deleted RMT + granularity 1024)

Full v0.37.0 → v0.38.0 trajectory

v0.38.0 Regression Investigation — Q4, D1, E1, E2

3-run results

Q4 and E2: Not regressions

D1: Nondeterministic, not a regression

E1: Real regression, consistent +2.5s

Corrected v0.37.0 → v0.38.0 table

1. `gl_edge` has no denormalized node properties — this is the !1095 (closed) gap

2. `gl_code_edge` projection analysis

3. Missing composite projection on `gl_edge` for denormalized filters

4. `gl_job` and `gl_stage` are missing parent foreign keys

5. `gl_note` has no noteable clustering projection

6. Code graph tables use `ReplacingMergeTree(_version)` without `_deleted`

7. Edge tables lack `add_minmax_index_for_temporal_columns`

1A. Add `_deleted` to ReplacingMergeTree engine for code graph tables

2A. `by_project_state` projection on `gl_merge_request`

2B. `by_noteable` projection on `gl_note`

2C. `by_pipeline` projection on `gl_job` (if `pipeline_id` is added)

2D. `by_pipeline` projection on `gl_stage`

Code graph queries (targeted by !1134 (merged): `_deleted` on RMT + granularity 1024)

MR-in-project queries (targeted by !1134 (merged): `by_project_state` projection)

Note queries (targeted by !1134 (merged): `by_noteable` projection)

CI chain queries (targeted by !1135 (merged): `pipeline_id` FK on Job/Stage)