CI partition pruning: cache record_id → partition_id to fix unpruned single id queries (#593701) · Issues · GitLab.org / GitLab

CI partition pruning: cache record_id → partition_id to fix unpruned single id queries

## Problem Several high-frequency Sidekiq workers query CI partitioned tables (`p_ci_pipelines`, `p_ci_builds`) without a `partition_id`, causing full cross-partition scans. This was identified as a primary contributor to the LockManager LWLock saturation on CI replicas in [incident INC-8367](https://gitlab.com/gitlab-com/gl-infra/production/-/work_items/21528) and its [incident review](https://gitlab.com/gitlab-com/gl-infra/production/-/work_items/21534#note_3158996366). The three highest call-rate offending queries are: | Query fingerprint | Query | Issue | |---|---|---| | `-6756117007090651256` | `SELECT p_ci_pipelines.* WHERE id = $1` | No `partition_id` — Sidekiq workers only enqueue with `pipeline_id` | | `4087446483174191214` | `SELECT p_ci_builds.* WHERE type IN (...) AND id = $4` | Partition pruning capable but [not wired up in `Ci::Deployable`](https://gitlab.com/gitlab-org/gitlab/-/blob/duo-edit-20260311-163221/app/models/concerns/ci/deployable.rb?plain=1#L18) | | `7090543644733999668` | `SELECT p_ci_builds.* WHERE token_encrypted IN (...)` | Possibly a fallback path when partition is missing | As the number of CI partitions grows, each of these non-pruned queries becomes progressively more expensive — this is a compounding problem that will worsen over time. Changing all worker signatures to accept a `partition_id` argument is not feasible in the short term: it requires coordinated multi-release changes to every affected worker. ## Proposal Use a two-layer database lookup strategy to resolve `partition_id` from a `record_id` without cross-partition scans, with no worker signature changes required. The key insight is that while the vast majority of new records land in the currently active partition, a small but meaningful fraction are inserted into older partitions — because retried jobs inherit the `partition_id` of their parent pipeline, which may have been created months earlier. The [spillover data](https://gitlab.com/gitlab-org/gitlab/-/work_items/593701#note_3177610978) confirms this: spillover rates are typically 0.01–0.6% across recent partitions. ### Two-layer lookup strategy **Layer 1 — `ci_builds_partition_overrides` overflow table** A dedicated table stores `(build_id, partition_id, project_id)` for every record inserted into a partition other than the one active at insertion time. It is hash-partitioned on `build_id` for fast single-row lookups, and is always checked first. For spillover records this is the authoritative source of the correct `partition_id`. ```sql CREATE TABLE IF NOT EXISTS ci_builds_partition_overrides ( build_id BIGINT NOT NULL, project_id BIGINT NOT NULL, partition_id INT NOT NULL, PRIMARY KEY (build_id), CONSTRAINT fk_ci_builds_partition_overrides_p_ci_builds FOREIGN KEY (build_id, partition_id) REFERENCES p_ci_builds (id, partition_id) ON DELETE CASCADE ON UPDATE CASCADE ) PARTITION BY HASH (build_id); CREATE TABLE IF NOT EXISTS ci_builds_partition_overrides_0 PARTITION OF ci_builds_partition_overrides FOR VALUES WITH (MODULUS 3, REMAINDER 0); CREATE TABLE IF NOT EXISTS ci_builds_partition_overrides_1 PARTITION OF ci_builds_partition_overrides FOR VALUES WITH (MODULUS 3, REMAINDER 1); CREATE TABLE IF NOT EXISTS ci_builds_partition_overrides_2 PARTITION OF ci_builds_partition_overrides FOR VALUES WITH (MODULUS 3, REMAINDER 2); ``` The `ON DELETE CASCADE` and `ON UPDATE CASCADE` foreign key constraints keep the override table consistent automatically: when a pipeline's rows are moved to a new physical partition, the corresponding override rows are updated without any additional application logic. Based on the [spillover estimates](https://gitlab.com/gitlab-org/gitlab/-/work_items/593701#note_3177610978), this table will hold approximately 50 million rows. With 2 indexes and 3 hash partitions, lookups remain in the fast path. **Layer 2 — ID range index on `ci_partitions`** Add an `int8range` column per partitioned table to `ci_partitions`, storing the range of IDs that were generated while that partition was the active write target. This is the fallback used when no override row exists — which is the case for the vast majority of records that landed naturally in their expected partition. ```sql ALTER TABLE ci_partitions ADD COLUMN ci_pipelines_id_range int8range, ADD COLUMN ci_builds_id_range int8range, ADD COLUMN ci_job_artifacts_id_range int8range, ADD COLUMN ci_stages_id_range int8range; ``` This table has ~12 rows and fits in a single buffer. The range scan completes in under a millisecond in practice. ### Query-time resolution At query time, the two layers are combined using a `COALESCE`: the override table is checked first (single index scan against one hash partition), and only if no row is found does the query fall back to the range scan on `ci_partitions` (sequential scan of a ~12-row table). Both paths complete in under a millisecond. ```ruby class BuildsPartitionOverride < Ci::ApplicationRecord self.table_name = :ci_builds_partition_overrides self.primary_key = :build_id def self.partition_id_for(build_id) connection.select_value(<<~SQL.squish) SELECT COALESCE( (SELECT partition_id FROM ci_builds_partition_overrides WHERE build_id = #{build_id}), (SELECT id FROM ci_partitions WHERE ci_builds_id_range @> #{build_id}::int8 ORDER BY id ASC LIMIT 1) ) AS partition_id; SQL end end def Ci::Build.find_by_id(id) partition_id = BuildsPartitionOverride.partition_id_for(id) find_by(id: id, partition_id: partition_id) end ``` This pattern needs to be implemented for all partitioned tables accessible by external ID only: pipelines, stages, builds, and artifacts. ### Backfill ```ruby # Backfill range columns on ci_partitions Ci::Partition.all.order(:id).to_a.each_cons(2) do |partition, next_p| min = Ci::Build.in_partition(partition.id).minimum(:id) max = Ci::Build.in_partition(next_p.id).minimum(:id) partition.update(ci_builds_id_range: min..max) end # Backfill override rows for spillover records Ci::Partition.order(:id).all.each do |partition| CommitStatus.in_partition(partition.id).where("id > ?", partition.ci_builds_id_range.max).each_batch(of: 10000) do |batch| BuildsPartitionOverride.connection.execute(<<~SQL) INSERT INTO ci_builds_partition_overrides(build_id, partition_id, project_id) #{batch.select('id as build_id, partition_id, project_id').to_sql} ON CONFLICT DO NOTHING SQL end end ``` ### Why this approach over Redis - **No Redis dependency on the hot path** — a Redis outage or excessive cache misses cannot cascade into database overload. - **Durable consistency** — the override table is the source of truth; there is no TTL expiry or silent eviction to reason about. - **Handles bulk lookups** — preloads and `id_in`-style queries can be resolved in a single `COALESCE` query rather than requiring per-record Redis round-trips. - **Self-maintaining** — `ON DELETE CASCADE` / `ON UPDATE CASCADE` keep the override table consistent when rows are moved between partitions. <details> <summary>Previous proposal: Redis-based <code>record_id → partition_id</code> cache</summary> ### Redis design Store `record_id → partition_id` mappings in Redis at record creation time for **all partitioned CI resources** (pipelines, builds, and any future partitioned CI tables), then patch the CI finder methods to transparently hydrate `partition_id` before querying — with no worker signature changes required. The key insight is that all CI partitioned resources share the same `partition_id` as their parent pipeline (it cascades down via `Ci::Partitionable`). This means a single pipeline-keyed cache entry covers lookups for all child resources of that pipeline. Use `Gitlab::Redis::SharedState` (not `Cache`, to survive memory pressure and worker retries). **For pipelines** — written at pipeline creation time, keyed by `pipeline_id`: ```ruby # Key pattern — hash-tag on pipeline_id for Redis Cluster slot affinity key = "ci:partition:pipeline:{#{pipeline_id}}" Gitlab::Redis::SharedState.with { |r| r.set(key, partition_id, ex: 14.days.to_i) } ``` **For builds and other child resources** — since `partition_id` cascades from the pipeline, child resources can resolve via the pipeline cache. However, for workers that only receive a `build_id` (with no `pipeline_id`), we also store a direct mapping at creation time: ```ruby # Written at build creation time key = "ci:partition:build:{#{build_id}}" Gitlab::Redis::SharedState.with { |r| r.set(key, partition_id, ex: 14.days.to_i) } ``` This pattern generalises to any new partitioned CI resource: at creation time, write `ci:partition:<resource_type>:{<record_id>}` → `partition_id`. **Key design decisions:** - **TTL of 14 days** — covers the practical window where Sidekiq jobs referencing a record are still being enqueued. Cold records beyond this window have lower query volume and the cross-partition scan cost is more acceptable. - **Hash-tag `{record_id}`** — ensures Redis Cluster slot affinity, consistent with existing patterns (e.g. Namespace config cache). - **`SharedState` not `Cache`** — `Cache` can be evicted under memory pressure; a silent miss would degrade to a full scan rather than fail loudly. - **Value is just the integer `partition_id`** — minimal memory footprint. At ~50 bytes/key, even storing entries for both pipelines and builds is well within bounds. - **Write on record creation** — single write point per resource type, e.g. `after_create` callback in `Ci::Partitionable` so all current and future partitioned models get it automatically. ### Finder patch design Introduce a generic `Ci::PartitionIdResolver` that resolves `partition_id` for any partitioned CI resource type: ```ruby module Ci module PartitionIdResolver REDIS_TTL = 14.days.to_i KEY_PREFIX = "ci:partition" # resource_type: :pipeline, :build, etc. def self.resolve(resource_type, record_id) key = "#{KEY_PREFIX}:#{resource_type}:{#{record_id}}" cached = Gitlab::Redis::SharedState.with { |r| r.get(key) } return cached.to_i if cached # Cold path: DB fallback — cross-partition scan but only on cache miss model = model_for(resource_type) record = model.find_by(id: record_id) return nil unless record # Backfill cache on miss Gitlab::Redis::SharedState.with { |r| r.set(key, record.partition_id, ex: REDIS_TTL) } record.partition_id end def self.model_for(resource_type) { pipeline: Ci::Pipeline, build: Ci::Build }.fetch(resource_type) end end end ``` Patch the base finder via a concern included in `Ci::ApplicationRecord` (or `Ci::Partitionable`) so all partitioned models get it automatically: ```ruby module Ci module PartitionAwareFinder extend ActiveSupport::Concern class_methods do # Resolves partition_id from Redis and injects it into the query. # resource_type defaults to the model's own type (e.g. :build for Ci::Build), # but can be overridden when the caller has a related resource id # (e.g. looking up a build by pipeline_id). def find_by_id_with_partition(id, resource_type: nil, resource_id: nil) resource_type ||= inferred_resource_type resource_id ||= id partition_id = Ci::PartitionIdResolver.resolve(resource_type, resource_id) if partition_id find_by(id: id, partition_id: partition_id) else find_by(id: id) # graceful degradation end end private def inferred_resource_type name.demodulize.underscore.to_sym # Ci::Build => :build, Ci::Pipeline => :pipeline end end end end ``` This means: - `Ci::Pipeline.find_by_id_with_partition(pipeline_id)` — resolves via `ci:partition:pipeline:{pipeline_id}` - `Ci::Build.find_by_id_with_partition(build_id)` — resolves via `ci:partition:build:{build_id}` - `Ci::Build.find_by_id_with_partition(build_id, resource_type: :pipeline, resource_id: pipeline_id)` — resolves via the pipeline cache when the worker has a `pipeline_id` available (avoids a separate build cache entry) - Any future partitioned CI model gets the same behaviour automatically by including `Ci::Partitionable` **Known concerns with this approach** (discussed in the thread): - Memory footprint: ~15 million new Redis keys/day at ~50 bytes each = ~750 MB/day, ~10 GB at 14-day TTL. - `SharedState` does not have an eviction policy; memory exhaustion can cause stalled writes or downtime. Using `Cache` is safer but risks silent eviction degrading to full scans. - Redis outage or excessive cache misses can cascade into database overload. - Does not handle bulk lookups (preloads, `id_in` queries) without per-record round-trips. </details> ## References - Incident review: https://gitlab.com/gitlab-com/gl-infra/production/-/work_items/21534#note_3158996366 - CI partitioning design doc: https://handbook.gitlab.com/handbook/engineering/architecture/design-documents/ci_data_decay/pipeline_partitioning/ - `Ci::Deployable` unconnected pruning: https://gitlab.com/gitlab-org/gitlab/-/blob/duo-edit-20260311-163221/app/models/concerns/ci/deployable.rb?plain=1#L18 - `Ci::Partitionable` concern: https://gitlab.com/gitlab-org/gitlab/-/blob/920147293ae304639915f66b260dc14e4f629850/app/models/concerns/ci/partitionable.rb - Partition spillover data: https://gitlab.com/gitlab-org/gitlab/-/work_items/593701#note_3177610978

issue