-
DDL Schema v20 — Audit Against Perf Test Findings
Audit: What's missing or misaligned with perf findings
1.
gl_edgehas no denormalized node properties — this is the !1095 (closed) gapThe edge table carries only structural columns (
source_id,source_kind,relationship_kind,target_id,target_kind,traversal_path). Every cascade CTE that narrows by edge existence still requires a JOIN back to the node table to apply property filters likestatus = 'failed'orstate = 'merged'.What to add (high-value candidates based on our test corpus):
Edge relationship Source kind Denormalize from source Why IN_PROJECTPipeline status,sourceQ12 (TIMEOUT), Q24 (TIMEOUT), Q30 (15s), E1 (11.6s) IN_PROJECTJob status,failure_reasonQ24 (TIMEOUT), I1 (17s) IN_PROJECTMergeRequest stateQ4 (8.5s), F2 (9.8s), I5 (11.2s) IN_PROJECTNote noteable_typeA1 (11.4s), I5 (11.2s) HAS_STAGEPipeline (target: Stage) statuson StageA2 (TIMEOUT) HAS_LABELWorkItem state,work_item_typeI6 (9.8s) These are
LowCardinality(String)columns — adding them togl_edgecosts ~1 byte per row with LZ4. The edge table has projections ordered by(source_id, relationship_kind, ...)and(target_id, relationship_kind, ...), so a cascade query filteringWHERE relationship_kind = 'IN_PROJECT' AND target_id = 278964 AND source_status = 'failed'would be a tight range scan on theby_targetprojection.2.
gl_code_edgeprojection analysisThe code_edge table has
by_source,by_target,by_rel_source_kind,by_rel_target_kindprojections. Theby_sourceprojectionORDER BY (source_id, ...)should supporte0.source_id IN (SELECT id FROM _nf_f), but the compiler emits this as a semi-join that ClickHouse may not optimize via projection lookup. The fix is compiler-side: rewrite the semi-join as a direct JOIN against the_nf_*CTE, or use an explicit key-lookup pattern that the projection can serve.3. Missing composite projection on
gl_edgefor denormalized filtersIf denormalized properties are added (point 1), a new projection is needed:
PROJECTION by_target_filtered ( SELECT * ORDER BY (target_id, relationship_kind, source_kind, source_status, source_id, traversal_path) )This lets cascade CTEs filter
target_id = project_id AND relationship_kind = 'IN_PROJECT' AND source_kind = 'Pipeline' AND source_status = 'failed'as a contiguous key range.4.
gl_jobandgl_stageare missing parent foreign keysJobs link to Pipelines via
gl_edge(HAS_JOB), butgl_jobdoesn't carrypipeline_id. This forces every Job→Pipeline query through a 2-table join (edge + node).Same for
gl_stage— nopipeline_id, no way to bypass the edge table.Entities that already have good FK columns:
gl_note→ hasnoteable_id+noteable_type✓gl_merge_request_diff→ hasmerge_request_id✓gl_merge_request_diff_file→ hasmerge_request_diff_id+merge_request_id✓gl_deployment→ hasproject_idbut noenvironment_idorpipeline_id
5.
gl_notehas no noteable clustering projectionNotes are the largest SDLC entity. Table is ordered by
(traversal_path, id), so notes for the same MR/Issue are scattered by insert order. A projection would makeHAS_NOTEtraversals and "notes per MR" aggregations (I5, 11.2s) significantly faster:PROJECTION by_noteable ( SELECT * ORDER BY (traversal_path, noteable_type, noteable_id, id) )6. Code graph tables use
ReplacingMergeTree(_version)without_deleted-- gl_definition, gl_file, gl_directory, gl_imported_symbol: ENGINE = ReplacingMergeTree(_version) -- no _deleted!All other entity tables use
ReplacingMergeTree(_version, _deleted)withallow_experimental_replacing_merge_with_cleanup = 1. The code graph tables are the exception. This means deleted definitions/files/directories/imported_symbols are never garbage collected by background merges — they persist and the dedup scan processes them every time.This partially explains why code graph queries are disproportionately slow. The effective table size is larger than it should be because dead rows accumulate indefinitely.
Fix:
ENGINE = ReplacingMergeTree(_version, _deleted)for all four tables.7. Edge tables lack
add_minmax_index_for_temporal_columnsSDLC node tables (
gl_pipeline,gl_merge_request,gl_job,gl_work_item,gl_note,gl_vulnerability,gl_deployment) haveadd_minmax_index_for_temporal_columns = 1, auto-creating minmax indexes on DateTime columns. The edge tables don't. Since_versionis used inLIMIT 1 BYdedup, minmax on_versioncould help skip granules during dedup scans.8. Index granularity: code graph node tables may benefit from 1024
Edge tables use
index_granularity = 1024(good — more granules = better skip index for projections). Code graph node tables (gl_definition,gl_file, etc.) use 2048. Since these are the worst-performing entity tables and rely heavily on bloom/text indexes for filtering, 1024 granularity would give those indexes 2x more skip opportunities.Priority summary
Priority Change Tables Queries fixed P0 Add _deletedto RMT engine for code graph tablesgl_definition,gl_file,gl_directory,gl_imported_symbolAll Cluster 2 (code graph timeouts) P0 Denormalize status/stateontogl_edge+ new projectiongl_edgeQ12, Q24, Q30, A2, I1, I5, E1, F2, I6 (9 queries) P1 Add pipeline_idtogl_stage+ projectiongl_stageA2 (TIMEOUT) P1 Add by_noteableprojection togl_notegl_noteI5 (11.2s), A1 (11.4s) P2 Add pipeline_idtogl_job+ projectiongl_jobQ24, I1 P2 Reduce index_granularityto 1024 for code graph nodesgl_definition,gl_fileCluster 2 P2 Add add_minmax_index_for_temporal_columnsto edge tablesgl_edge,gl_code_edgeGeneral dedup perf -
Schema + Compiler Change Ideas — Ranked by Effort-to-Impact
Continuation of the schema audit above. These are concrete proposals ranked by bang-for-buck, with the DDL and compiler changes specified.
Tier 1: One-line DDL fixes, zero code changes
1A. Add
_deletedto ReplacingMergeTree engine for code graph tablesEffort: 4 ALTER TABLE statements. No code changes. No migration. Impact: Fixes silent garbage accumulation on all code graph tables. Deleted definitions, files, directories, and imported symbols currently persist forever because background merges don't clean them up. This inflates every code graph dedup scan. Queries fixed: All Cluster 2 (Q1, Q21, Q26, Q32 timeouts; Q22, Q31, G1, G3 slow)
-- Current (broken): ENGINE = ReplacingMergeTree(_version) -- Fixed: ALTER TABLE gl_definition MODIFY ENGINE = ReplacingMergeTree(_version, _deleted); ALTER TABLE gl_file MODIFY ENGINE = ReplacingMergeTree(_version, _deleted); ALTER TABLE gl_directory MODIFY ENGINE = ReplacingMergeTree(_version, _deleted); ALTER TABLE gl_imported_symbol MODIFY ENGINE = ReplacingMergeTree(_version, _deleted);Note: Can't ALTER ENGINE in ClickHouse — requires recreating the table or using
MODIFY SETTINGworkarounds. Realistically this is a schema migration, but the change itself is trivial.
Tier 2: DDL-only projection additions, no compiler changes needed if ClickHouse auto-selects
2A.
by_project_stateprojection ongl_merge_requestEffort: 1 ALTER + MATERIALIZE. No code changes if ClickHouse auto-selects. Impact: Collapses the cascade CTE + dedup scan into a single projection lookup for the most common query shape: "MRs in project X with state Y." The
project_idcolumn already exists on the table but isn't used. Queries fixed: Q4 (8.5s), F2 (9.8s), I5 (11.2s), Q6 (6.6s), E2 (5.9s)ALTER TABLE gl_merge_request ADD PROJECTION by_project_state ( SELECT * ORDER BY (traversal_path, project_id, state, id, _version) ); ALTER TABLE gl_merge_request MATERIALIZE PROJECTION by_project_state;The compiler currently emits
_cascade_mras an edge scan to find MR IDs in a project, then dedupes the full MR table filtered to those IDs. With this projection, a query likeWHERE traversal_path IN (...) AND project_id = 278964 AND state = 'merged' ORDER BY id, _version DESC LIMIT 1 BY idwould be a contiguous range scan.2B.
by_noteableprojection ongl_noteEffort: 1 ALTER + MATERIALIZE. Impact: Notes for the same MR/Issue are currently scattered by insert order. This clusters them for HAS_NOTE traversals. Queries fixed: I5 (11.2s), A1 (11.4s)
ALTER TABLE gl_note ADD PROJECTION by_noteable ( SELECT * ORDER BY (traversal_path, noteable_type, noteable_id, id, _version) ); ALTER TABLE gl_note MATERIALIZE PROJECTION by_noteable;2C.
by_pipelineprojection ongl_job(ifpipeline_idis added)Effort: ALTER to add column + projection + MATERIALIZE. Write-path needs to populate
pipeline_id. Impact: Bypasses the edge table for Job→Pipeline traversals entirely. Queries fixed: Q24 (TIMEOUT), I1 (17s), Q30 (15s)ALTER TABLE gl_job ADD COLUMN pipeline_id Nullable(Int64) CODEC(ZSTD(1)); ALTER TABLE gl_job ADD PROJECTION by_pipeline_status ( SELECT * ORDER BY (traversal_path, pipeline_id, status, id, _version) ); ALTER TABLE gl_job MATERIALIZE PROJECTION by_pipeline_status;2D.
by_pipelineprojection ongl_stageSame pattern as 2C. Queries fixed: A2 (TIMEOUT)
ALTER TABLE gl_stage ADD COLUMN pipeline_id Nullable(Int64) CODEC(ZSTD(1)); ALTER TABLE gl_stage ADD PROJECTION by_pipeline ( SELECT * ORDER BY (traversal_path, pipeline_id, id, _version) ); ALTER TABLE gl_stage MATERIALIZE PROJECTION by_pipeline;
Tier 3: DDL + compiler changes together
3A. Dedup-ids projection on dense tables
Effort: DDL projection on 5 tables + compiler change to emit
ORDER BY id, _version DESCinstead ofORDER BY traversal_path, id, _version DESCwhen the CTE is fed by a cascade ID set. Impact: Cuts dedup I/O by 25-100x for cascade-fed CTEs. Currently_nf_*CTEs doSELECT * FROM table ORDER BY traversal_path, id, _version DESC LIMIT 1 BY id, deserializing all columns (including large text blobs likedescription) just to produce an ID set. A lightweight projection reads onlyid,_version,_deleted.Queries fixed: Every query that uses cascade CTEs — Q4, Q30, Q25, D1, D2, D3, E1, E2, F2, I1, I5, I6, and more.
-- On each of: gl_job, gl_pipeline, gl_note, gl_merge_request, gl_stage ALTER TABLE gl_job ADD PROJECTION dedup_ids ( SELECT id, _version, _deleted ORDER BY (id, _version DESC) ); ALTER TABLE gl_job MATERIALIZE PROJECTION dedup_ids; ALTER TABLE gl_pipeline ADD PROJECTION dedup_ids ( SELECT id, _version, _deleted ORDER BY (id, _version DESC) ); ALTER TABLE gl_pipeline MATERIALIZE PROJECTION dedup_ids; ALTER TABLE gl_note ADD PROJECTION dedup_ids ( SELECT id, _version, _deleted ORDER BY (id, _version DESC) ); ALTER TABLE gl_note MATERIALIZE PROJECTION dedup_ids; ALTER TABLE gl_merge_request ADD PROJECTION dedup_ids ( SELECT id, _version, _deleted ORDER BY (id, _version DESC) ); ALTER TABLE gl_merge_request MATERIALIZE PROJECTION dedup_ids; ALTER TABLE gl_stage ADD PROJECTION dedup_ids ( SELECT id, _version, _deleted ORDER BY (id, _version DESC) ); ALTER TABLE gl_stage MATERIALIZE PROJECTION dedup_ids;Compiler change: When the
_nf_*CTE is fed by a_cascade_*ID set (i.e.,WHERE id IN (SELECT id FROM _cascade_X)), emit the dedup as:_nf_X AS ( SELECT id FROM ( SELECT id, _version, _deleted FROM gl_X WHERE id IN (SELECT id FROM _cascade_X) ORDER BY id, _version DESC LIMIT 1 BY id ) WHERE _deleted = false )This matches the
dedup_idsprojection key order(id, _version DESC)and reads only 3 columns. Today the compiler emitsORDER BY traversal_path, id, _version DESCwhich forces the main table ORDER BY and reads all columns.Caveat: Property filters (
state = 'merged', etc.) can't run on the dedup_ids projection since it doesn't have those columns. Two options:- Apply property filters in the main query JOIN (after the ID set is resolved). This is safe if the main query already re-filters.
- For the most common filters, use the existing
by_state_id/by_status_idprojections instead. The compiler picks whichever projection matches the CTE's filter set.
3B. Materialize authorization scope once
Effort: Compiler change only, no DDL. Impact: Eliminates 5-10 copies of the 30-way
startsWith(traversal_path, ...)OR block per query. Reduces query parse/plan time and lets ClickHouse evaluate the predicate once. Queries fixed: Every query (constant overhead reduction).Compiler emits:
WITH _auth_paths AS ( SELECT traversal_path FROM ( SELECT '1/9970/' AS traversal_path UNION ALL SELECT '1/9970/106525819/' UNION ALL SELECT '1/9970/78934129/' -- ... all 30 paths ) )Then each CTE and JOIN references it via:
WHERE EXISTS ( SELECT 1 FROM _auth_paths AS ap WHERE startsWith(t.traversal_path, ap.traversal_path) )Or better, pass as a parameter:
-- Single parameter, referenced everywhere WHERE arrayExists(p -> startsWith(t.traversal_path, p), {auth_paths:Array(String)})This is pure compiler work. The 30-way OR is generated by the authorization layer and inlined into every subquery. Hoisting it into a CTE or parameter is a refactor of the SQL emitter.
3C. Edge denormalization (!1095 (closed))
Effort: Schema migration + write-path changes + new projection on
gl_edge. Highest effort of all proposals. Impact: Highest absolute impact. Eliminates node table JOINs in cascade CTEs entirely for the denormalized properties. The cascade becomes a single-table filtered scan ongl_edge. Queries fixed: Q12, Q24, Q30, A2, I1, I5, E1, F2, I6 (9 queries, including 3 timeouts).ALTER TABLE gl_edge ADD COLUMN source_state LowCardinality(Nullable(String)) CODEC(LZ4); ALTER TABLE gl_edge ADD COLUMN source_status LowCardinality(Nullable(String)) CODEC(LZ4); ALTER TABLE gl_edge ADD COLUMN source_type LowCardinality(Nullable(String)) CODEC(LZ4); ALTER TABLE gl_edge ADD PROJECTION by_target_filtered ( SELECT * ORDER BY (target_id, relationship_kind, source_kind, source_status, source_id, traversal_path, target_kind) ); ALTER TABLE gl_edge MATERIALIZE PROJECTION by_target_filtered;Write-path: when upserting an edge, look up the source node's
state/statusand write it to the edge row. When a node'sstate/statuschanges, update all connected edges. This is the write amplification cost.
Summary: Effort vs. Impact Matrix
# Change Effort Scope Timeouts fixed Slow queries fixed 1A _deletedon code graph RMTTrivial (migration) Code graph only 4-6 (Q1,Q21,Q26,Q32,G1) 3 (Q22,Q31,G3) 2A by_project_stateproj on MRLow (DDL) MR queries 0 5 (Q4,F2,I5,Q6,E2) 2B by_noteableproj on NoteLow (DDL) Note queries 0 2 (I5,A1) 2C pipeline_id+ proj on JobMedium (DDL+write) CI queries 1 (Q24) 2 (I1,Q30) 2D pipeline_id+ proj on StageMedium (DDL+write) CI queries 1 (A2) 0 3A dedup_idsproj + compilerMedium (DDL+compiler) All entities 0 All cascade queries 3B Auth scope materialization Medium (compiler) All queries 0 All (constant reduction) 3C Edge denormalization High (schema+write+compiler) SDLC edges 3 (Q12,Q24,A2) 6 (Q30,I1,I5,E1,F2,I6) If I could only ship 3 things: 1A + 3A + 3C. That's the
_deletedfix (stops the bleeding on code graph), dedup-ids projections (cuts I/O across the board), and edge denormalization (kills the cascade-to-node-table JOIN for the densest tables). Together they address every cluster from the perf test.If I could only ship 1 thing: 3C (edge denormalization). It's the most effort but it's the only change that directly eliminates timeouts on production queries without requiring the user to change their query shape.
-
v0.37.1 Baseline — Before !1134 (merged) / !1135 (merged) / !1136 (merged)
These are the numbers on the currently deployed version (v0.37.1). The 3 MRs have merged but haven't rolled out yet. This is the "before" for comparison once v0.38.0 deploys.
Code graph queries (targeted by !1134 (merged):
_deletedon RMT + granularity 1024)Query v0.37.1 (before) Earlier run Delta Q1 DEFINES traversal 29198ms (barely survived!) 30274ms TIMEOUT ~same, got lucky Q21 CALLS aggregation 30528ms TIMEOUT 30650ms TIMEOUT same Q26 CALLS token_match 30335ms TIMEOUT 30336ms TIMEOUT same Q32 EXTENDS 30373ms TIMEOUT 30258ms TIMEOUT same Q22 IMPORTS 10415ms 14533ms 28% faster (variance?) Q31 ON_BRANCH 19564ms 21553ms 9% faster (variance?) G1 3-hop code chain 30342ms TIMEOUT 30330ms TIMEOUT same G3 import resolve 23141ms 23601ms same Note: !1134 (merged)'s
_deletedRMT fix and granularity change are DDL-only. The dead rows won't disappear until background merges run after deployment. Expect gradual improvement over hours/days post-deploy, not instant.MR-in-project queries (targeted by !1134 (merged):
by_project_stateprojection)Query v0.37.1 (before) Earlier run Delta Q4 MR count per author 14093ms 8536ms worse (variance/load) F2 Milestone MR count 7944ms 9817ms 19% better Q6 User MRs traversal 9247ms 6626ms worse (variance) E2 sum(lines) per user 5985ms 5913ms same I5 Note count per MR 9779ms 11210ms 13% better Note: The
by_project_stateprojection is DDL-only — it needsMATERIALIZE PROJECTIONto populate. Until deploy + materialize, ClickHouse won't use it.Note queries (targeted by !1134 (merged):
by_noteableprojection)Query v0.37.1 (before) Earlier run Delta I5 Note count per MR 9779ms 11210ms 13% better (variance) A1 Note count per project 10450ms 11450ms 9% better (variance) Same caveat — projection not materialized yet.
CI chain queries (targeted by !1135 (merged):
pipeline_idFK on Job/Stage)Query v0.37.1 (before) Earlier run Delta Q24 Job->Pipeline->Project agg 30379ms TIMEOUT 30249ms TIMEOUT same I1 Job count by failure 24072ms 17282ms worse (variance/load) Q30 Pipeline->Job traversal 10841ms 15591ms 30% better? A2 Stage count per Pipeline 30325ms TIMEOUT 30391ms TIMEOUT same Note: !1135 (merged) adds the
pipeline_idcolumn and projections, but the column won't be populated until the indexer backfills it post-deploy. Until then,pipeline_idis NULL for all existing rows.Cascade-fed dedup queries (targeted by !1136 (merged): id-only sort key)
Query v0.37.1 (before) Earlier run Delta Q25 MR->Diff 4605ms 8606ms 46% faster D1 4-hop traversal 13153ms 11261ms worse (variance) E1 avg(duration) Pipeline 14942ms 11642ms worse (variance) I6 Label usage 7663ms 9790ms 22% faster Note: !1136 (merged) is a compiler change, not DDL. It won't take effect until v0.38.0 deploys to the webserver pods.
Summary
All numbers are consistent with none of these changes being live yet:
- Code graph: same timeouts (dead rows still present, granularity unchanged)
- MR/Note projections: not materialized
- pipeline_id FK: column not populated
- Dedup sort key: compiler not deployed
This is a clean baseline. Rerun after v0.38.0 rollout to measure actual impact.
-
v0.37.0 → v0.37.1 Before/After
Cluster upgraded from v0.37.0 to v0.37.1 during this session. Initial corpus ran on v0.37.0, reruns on v0.37.1. The v0.37.1 release includes !1116 (merged) (trie subsumption), !1124 (merged) (wildcard traversal pruning), !1125 (merged) (path-finding frontier pruning), !1126 (merged) (code-edge projections).
!1125 (merged) — Path finding frontier pruning: BIG wins
Query v0.37.0 v0.37.1 Delta Q7 path User→Project (node_ids, depth 3) 11842ms 7631ms -36% Q17 path User→User (all_shortest, depth 3) 7867ms 3759ms -52% Q23 path with filters + rel_types 6664ms 5724ms -14% H1 path Group→Group 6125ms 641ms -90% H1 is a 10x improvement. The frontier pruning is clearly effective when endpoints are in the same namespace subtree (both Groups under
1/), allowing aggressive early elimination.!1116 (merged) — Trie subsumption: ~2x on non-bottlenecked queries
Query v0.37.0 v0.37.1 Delta Q10 multi-hop CONTAINS traversal 8010ms 3539ms -56% Q28 multi-hop CONTAINS aggregation 11713ms 5023ms -57% Q5 source-side MR count 2311ms 1175ms -49% Q12 group pipeline agg (TIMEOUT) 30270ms 30341ms same — Pipeline scan dominates The trie collapses redundant
startsWithpaths (e.g.,1/9970/subsumes1/9970/106525819/), cutting the OR clause count. ~2x improvement on queries where the traversal_path predicate was the bottleneck. No effect on queries where the dense table dedup scan dominates (Q12).!1124 (merged) — Wildcard traversal pruning: solid on neighbors
Query v0.37.0 v0.37.1 Delta Q3 bidir neighbors Project 3056ms 1944ms -36% Q15 neighbors many rel_types 2486ms 2270ms -9% B2 bidir neighbors Group 3105ms 1624ms -48% !1126 (merged) — Code-edge projections: marginal
Query v0.37.0 v0.37.1 Delta Q1 DEFINES traversal 30274ms TIMEOUT 29198ms barely survived Q21 CALLS aggregation 30650ms TIMEOUT 30528ms TIMEOUT same Q26 CALLS token_match 30336ms TIMEOUT 30335ms TIMEOUT same Q32 EXTENDS 30258ms TIMEOUT 30373ms TIMEOUT same Q22 IMPORTS 14533ms 10415ms -28% Q31 ON_BRANCH 21553ms 19564ms -9% G1 3-hop code chain 30330ms TIMEOUT 30342ms TIMEOUT same G3 import resolve 23601ms 23141ms same The new projections on
gl_code_edgehelped the edge scan phase (IMPORTS improved 28%), but the_nf_*CTE dedup scan ongl_definition/gl_fileis still the dominant cost. The projections can't help there — that needs the_deletedRMT fix (!1134 (merged)) and dedup-ids optimization (!1136 (merged)) from v0.38.0.Summary
MR Target Result !1125 (merged) path-finding pruning Path queries 36-90% faster — H1 went from 6.1s to 641ms !1116 (merged) trie subsumption All queries (auth OR) 49-57% faster on non-bottlenecked queries !1124 (merged) wildcard pruning Neighbors 36-48% faster !1126 (merged) code-edge projections Code graph 28% on IMPORTS only, timeouts unchanged The v0.37.1 changes hit the authorization overhead (trie) and graph traversal phases (path/neighbor pruning) effectively. The remaining timeouts are all in the dedup/node-table-scan phase, which is what v0.38.0 (!1134 (merged) + !1136 (merged)) targets.
-
v0.37.1 → v0.38.0 Before/After
Cluster now on v0.38.0. MRs live: !1134 (merged) (_deleted RMT, by_project_state, by_noteable, granularity 1024), !1135 (merged) (pipeline_id FK on Job/Stage), !1136 (merged) (cascade-fed dedup id-only sort key).
Important caveat: !1134 (merged) and !1135 (merged) are DDL changes. Projections need
MATERIALIZE, _deleted cleanup needs background merges, pipeline_id needs indexer backfill. These are NOT instant — improvement will be gradual over hours/days. !1136 (merged) is a compiler change and should be immediate.!1136 (merged) — Cascade-fed dedup (compiler: id-only sort key)
Query v0.37.0 v0.37.1 v0.38.0 v0.37.1→v0.38.0 Q4 MR count per author 8536ms 14093ms 24495ms ❌ worseQ25 MR→Diff 8606ms 4605ms 6013ms ~ variance D1 4-hop User→MR→Pipe→Job 11261ms 13153ms 17393ms ❌ worseE1 avg(duration) Pipeline 11642ms 14942ms 18078ms ❌ worseE2 sum(lines) per user 5913ms 5985ms 12402ms ❌ worseF2 Milestone MR count 9817ms 7944ms 5690ms ✅ -28%I6 Label usage 9790ms 7663ms 4115ms ✅ -46%Q6 User MRs traversal 6626ms 9247ms 4649ms ✅ -50%Mixed results. F2, I6, Q6 improved significantly. Q4, D1, E1, E2 got worse — possibly load/warmup or the new sort key interacting badly with these specific query shapes.
!1135 (merged) — pipeline_id FK on Job/Stage
Query v0.37.0 v0.37.1 v0.38.0 v0.37.1→v0.38.0 Q24 Job→Pipeline→Project agg TIMEOUT TIMEOUT TIMEOUT same I1 Job count by failure 17282ms 24072ms 30242ms ❌ worse (near-timeout)Q30 Pipeline→Job traversal 15591ms 10841ms 7220ms ✅ -33%A2 Stage count per Pipeline TIMEOUT TIMEOUT TIMEOUT same Q30 improved. Q24 and A2 still timeout — pipeline_id column likely not backfilled yet. I1 regressed to near-timeout.
!1134 (merged) — Note projections
Query v0.37.0 v0.37.1 v0.38.0 v0.37.1→v0.38.0 I5 Note count per MR 11210ms 9779ms 8010ms ✅ -18%A1 Note count per project 11450ms 10450ms 9102ms ✅ -13%Modest improvement. Projection may not be fully materialized yet.
!1134 (merged) — Code graph (_deleted RMT + granularity 1024)
Query v0.37.0 v0.37.1 v0.38.0 v0.37.1→v0.38.0 Q1 DEFINES TIMEOUT 29198ms 27675ms ✅ surviving nowQ21 CALLS agg TIMEOUT TIMEOUT TIMEOUT same Q26 CALLS token TIMEOUT TIMEOUT TIMEOUT same Q32 EXTENDS TIMEOUT TIMEOUT TIMEOUT same Q22 IMPORTS 14533ms 10415ms 9530ms ✅ -9%Q31 ON_BRANCH 21553ms 19564ms 15083ms ✅ -23%G1 3-hop chain TIMEOUT TIMEOUT TIMEOUT same G3 import resolve 23601ms 23141ms 19677ms ✅ -15%Code graph queries are trending down. Q31 dropped from 21.5s→15.1s across two releases. Q22 from 14.5s→9.5s. The _deleted RMT fix needs background merge time to fully take effect — dead rows are being cleaned up gradually. Expect continued improvement over the next few hours.
Full v0.37.0 → v0.38.0 trajectory
Query v0.37.0 v0.38.0 Total delta Q6 User MRs 6626ms 4649ms -30% F2 Milestone MRs 9817ms 5690ms -42% I6 Label usage 9790ms 4115ms -58% Q30 Pipe→Job 15591ms 7220ms -54% I5 Notes per MR 11210ms 8010ms -29% A1 Note count 11450ms 9102ms -21% Q22 IMPORTS 14533ms 9530ms -34% Q31 ON_BRANCH 21553ms 15083ms -30% G3 import resolve 23601ms 19677ms -17% Q1 DEFINES TIMEOUT 27675ms rescued from timeout 10 queries improved. 4 code graph queries still timeout (Q21, Q26, Q32, G1). Q24 and A2 still timeout (awaiting pipeline_id backfill). Some queries regressed (Q4, D1, E1, E2) — needs investigation, may be load-related or a regression in the new dedup sort key for certain shapes.
-
v0.38.0 Regression Investigation — Q4, D1, E1, E2
Reran all 4 suspected regressions 3x each on v0.38.0. Full reindex was done so all projections are materialized and pipeline_id is backfilled.
3-run results
Query Run 1 Run 2 Run 3 Median v0.37.0 Verdict Q4 MR count per author 5387ms 6844ms 5197ms 5387ms 8536ms ✅ -37% improvementD1 4-hop chain 12395ms 15145ms 12968ms 12968ms 11261ms ~ flat (variance) E1 avg(duration) 14154ms 13970ms 14613ms 14154ms 11642ms ❓ regression (consistent)E2 sum(lines) 4788ms 4972ms 9508ms 4972ms 5913ms ✅ -16% improvementQ4 and E2: Not regressions
The first v0.38.0 run (24.5s for Q4, 12.4s for E2) was a cold-cache outlier. Median across 3 runs shows both improved vs v0.37.0. Q4 dropped from 8.5s → 5.4s, E2 from 5.9s → 5.0s.
D1: Nondeterministic, not a regression
D1 has no
order_byclause. The outerLIMIT 10picks whichever edge rows ClickHouse scans first, which depends on merge state. Running twice produced completely different result sets:Run 1: Job IDs
[13361001921, 13361022839, 13361038859, ...], 6 MRs, 6 Pipelines Run 2: Job IDs[13364120970, 13628566167, 13628566232, ...], 6 MRs, 6 PipelinesDifferent MRs, different Pipelines, different Jobs. The
row_countinstability (29→15→21→23) is becauserow_countcounts unique nodes in the response, which varies with which edges win the race.The 12-17s timing spread is scan nondeterminism, not a regression. Adding
order_bywould stabilize both results and timing.E1: Real regression, consistent +2.5s
E1 is a 2-node aggregation:
Pipeline(success, created_at >= April) →IN_PROJECT→ Project(278964), avg+max+count. It was consistently 11.6s on v0.37.0 and is now consistently 14s on v0.38.0 (3 runs: 14154, 13970, 14613).This is a 2-node query — the cascade-fed dedup change from !1136 (merged) shouldn't apply (no cascade CTE for 2-node). Possible causes:
- Reindex increased data volume — more Pipeline rows indexed since v0.37.0 data was current. ~1 month of new pipelines could add significant scan volume.
- Projection materialization overhead — the new projections on other tables (MR, Note, Job, Stage) added write-path work that affects shared ClickHouse resources. Unlikely to cause a steady-state read regression though.
by_status_idprojection contention — Pipeline already has aby_status_idprojection. Thestatus = 'success'filter should use it. But if the reindex changed the projection's sort order or granule distribution, it could affect scan efficiency.
Recommend checking
EXPLAIN PIPELINEon the E1 query to see which projection ClickHouse is selecting.Corrected v0.37.0 → v0.38.0 table
Query v0.37.0 v0.38.0 (median) Delta Q4 MR count per author 8536ms 5387ms ✅ -37%D1 4-hop chain 11261ms 12968ms ~ flat (nondeterministic) E1 avg(duration) 11642ms 14154ms ❌ +22% regressionE2 sum(lines) 5913ms 4972ms ✅ -16%
Please register or sign in to comment