CI partition pruning: cache record_id → partition_id to fix unpruned single id queries
## Problem
Several high-frequency Sidekiq workers query CI partitioned tables (`p_ci_pipelines`, `p_ci_builds`) without a `partition_id`, causing full cross-partition scans. This was identified as a primary contributor to the LockManager LWLock saturation on CI replicas in [incident INC-8367](https://gitlab.com/gitlab-com/gl-infra/production/-/work_items/21528) and its [incident review](https://gitlab.com/gitlab-com/gl-infra/production/-/work_items/21534#note_3158996366).
The three highest call-rate offending queries are:
| Query fingerprint | Query | Issue |
|---|---|---|
| `-6756117007090651256` | `SELECT p_ci_pipelines.* WHERE id = $1` | No `partition_id` — Sidekiq workers only enqueue with `pipeline_id` |
| `4087446483174191214` | `SELECT p_ci_builds.* WHERE type IN (...) AND id = $4` | Partition pruning capable but [not wired up in `Ci::Deployable`](https://gitlab.com/gitlab-org/gitlab/-/blob/duo-edit-20260311-163221/app/models/concerns/ci/deployable.rb?plain=1#L18) |
| `7090543644733999668` | `SELECT p_ci_builds.* WHERE token_encrypted IN (...)` | Possibly a fallback path when partition is missing |
As the number of CI partitions grows, each of these non-pruned queries becomes progressively more expensive — this is a compounding problem that will worsen over time.
Changing all worker signatures to accept a `partition_id` argument is not feasible in the short term: it requires coordinated multi-release changes to every affected worker.
## Proposal
Use a two-layer database lookup strategy to resolve `partition_id` from a `record_id` without cross-partition scans, with no worker signature changes required.
The key insight is that while the vast majority of new records land in the currently active partition, a small but meaningful fraction are inserted into older partitions — because retried jobs inherit the `partition_id` of their parent pipeline, which may have been created months earlier. The [spillover data](https://gitlab.com/gitlab-org/gitlab/-/work_items/593701#note_3177610978) confirms this: spillover rates are typically 0.01–0.6% across recent partitions.
### Two-layer lookup strategy
**Layer 1 — `ci_builds_partition_overrides` overflow table**
A dedicated table stores `(build_id, partition_id, project_id)` for every record inserted into a partition other than the one active at insertion time. It is hash-partitioned on `build_id` for fast single-row lookups, and is always checked first. For spillover records this is the authoritative source of the correct `partition_id`.
```sql
CREATE TABLE IF NOT EXISTS ci_builds_partition_overrides (
build_id BIGINT NOT NULL,
project_id BIGINT NOT NULL,
partition_id INT NOT NULL,
PRIMARY KEY (build_id),
CONSTRAINT fk_ci_builds_partition_overrides_p_ci_builds
FOREIGN KEY (build_id, partition_id)
REFERENCES p_ci_builds (id, partition_id)
ON DELETE CASCADE
ON UPDATE CASCADE
) PARTITION BY HASH (build_id);
CREATE TABLE IF NOT EXISTS ci_builds_partition_overrides_0 PARTITION OF ci_builds_partition_overrides FOR VALUES WITH (MODULUS 3, REMAINDER 0);
CREATE TABLE IF NOT EXISTS ci_builds_partition_overrides_1 PARTITION OF ci_builds_partition_overrides FOR VALUES WITH (MODULUS 3, REMAINDER 1);
CREATE TABLE IF NOT EXISTS ci_builds_partition_overrides_2 PARTITION OF ci_builds_partition_overrides FOR VALUES WITH (MODULUS 3, REMAINDER 2);
```
The `ON DELETE CASCADE` and `ON UPDATE CASCADE` foreign key constraints keep the override table consistent automatically: when a pipeline's rows are moved to a new physical partition, the corresponding override rows are updated without any additional application logic.
Based on the [spillover estimates](https://gitlab.com/gitlab-org/gitlab/-/work_items/593701#note_3177610978), this table will hold approximately 50 million rows. With 2 indexes and 3 hash partitions, lookups remain in the fast path.
**Layer 2 — ID range index on `ci_partitions`**
Add an `int8range` column per partitioned table to `ci_partitions`, storing the range of IDs that were generated while that partition was the active write target. This is the fallback used when no override row exists — which is the case for the vast majority of records that landed naturally in their expected partition.
```sql
ALTER TABLE ci_partitions
ADD COLUMN ci_pipelines_id_range int8range,
ADD COLUMN ci_builds_id_range int8range,
ADD COLUMN ci_job_artifacts_id_range int8range,
ADD COLUMN ci_stages_id_range int8range;
```
This table has ~12 rows and fits in a single buffer. The range scan completes in under a millisecond in practice.
### Query-time resolution
At query time, the two layers are combined using a `COALESCE`: the override table is checked first (single index scan against one hash partition), and only if no row is found does the query fall back to the range scan on `ci_partitions` (sequential scan of a ~12-row table). Both paths complete in under a millisecond.
```ruby
class BuildsPartitionOverride < Ci::ApplicationRecord
self.table_name = :ci_builds_partition_overrides
self.primary_key = :build_id
def self.partition_id_for(build_id)
connection.select_value(<<~SQL.squish)
SELECT COALESCE(
(SELECT partition_id
FROM ci_builds_partition_overrides
WHERE build_id = #{build_id}),
(SELECT id
FROM ci_partitions
WHERE ci_builds_id_range @> #{build_id}::int8
ORDER BY id ASC LIMIT 1)
) AS partition_id;
SQL
end
end
def Ci::Build.find_by_id(id)
partition_id = BuildsPartitionOverride.partition_id_for(id)
find_by(id: id, partition_id: partition_id)
end
```
This pattern needs to be implemented for all partitioned tables accessible by external ID only: pipelines, stages, builds, and artifacts.
### Backfill
```ruby
# Backfill range columns on ci_partitions
Ci::Partition.all.order(:id).to_a.each_cons(2) do |partition, next_p|
min = Ci::Build.in_partition(partition.id).minimum(:id)
max = Ci::Build.in_partition(next_p.id).minimum(:id)
partition.update(ci_builds_id_range: min..max)
end
# Backfill override rows for spillover records
Ci::Partition.order(:id).all.each do |partition|
CommitStatus.in_partition(partition.id).where("id > ?", partition.ci_builds_id_range.max).each_batch(of: 10000) do |batch|
BuildsPartitionOverride.connection.execute(<<~SQL)
INSERT INTO ci_builds_partition_overrides(build_id, partition_id, project_id)
#{batch.select('id as build_id, partition_id, project_id').to_sql}
ON CONFLICT DO NOTHING
SQL
end
end
```
### Why this approach over Redis
- **No Redis dependency on the hot path** — a Redis outage or excessive cache misses cannot cascade into database overload.
- **Durable consistency** — the override table is the source of truth; there is no TTL expiry or silent eviction to reason about.
- **Handles bulk lookups** — preloads and `id_in`-style queries can be resolved in a single `COALESCE` query rather than requiring per-record Redis round-trips.
- **Self-maintaining** — `ON DELETE CASCADE` / `ON UPDATE CASCADE` keep the override table consistent when rows are moved between partitions.
<details>
<summary>Previous proposal: Redis-based <code>record_id → partition_id</code> cache</summary>
### Redis design
Store `record_id → partition_id` mappings in Redis at record creation time for **all partitioned CI resources** (pipelines, builds, and any future partitioned CI tables), then patch the CI finder methods to transparently hydrate `partition_id` before querying — with no worker signature changes required.
The key insight is that all CI partitioned resources share the same `partition_id` as their parent pipeline (it cascades down via `Ci::Partitionable`). This means a single pipeline-keyed cache entry covers lookups for all child resources of that pipeline.
Use `Gitlab::Redis::SharedState` (not `Cache`, to survive memory pressure and worker retries).
**For pipelines** — written at pipeline creation time, keyed by `pipeline_id`:
```ruby
# Key pattern — hash-tag on pipeline_id for Redis Cluster slot affinity
key = "ci:partition:pipeline:{#{pipeline_id}}"
Gitlab::Redis::SharedState.with { |r| r.set(key, partition_id, ex: 14.days.to_i) }
```
**For builds and other child resources** — since `partition_id` cascades from the pipeline, child resources can resolve via the pipeline cache. However, for workers that only receive a `build_id` (with no `pipeline_id`), we also store a direct mapping at creation time:
```ruby
# Written at build creation time
key = "ci:partition:build:{#{build_id}}"
Gitlab::Redis::SharedState.with { |r| r.set(key, partition_id, ex: 14.days.to_i) }
```
This pattern generalises to any new partitioned CI resource: at creation time, write `ci:partition:<resource_type>:{<record_id>}` → `partition_id`.
**Key design decisions:**
- **TTL of 14 days** — covers the practical window where Sidekiq jobs referencing a record are still being enqueued. Cold records beyond this window have lower query volume and the cross-partition scan cost is more acceptable.
- **Hash-tag `{record_id}`** — ensures Redis Cluster slot affinity, consistent with existing patterns (e.g. Namespace config cache).
- **`SharedState` not `Cache`** — `Cache` can be evicted under memory pressure; a silent miss would degrade to a full scan rather than fail loudly.
- **Value is just the integer `partition_id`** — minimal memory footprint. At ~50 bytes/key, even storing entries for both pipelines and builds is well within bounds.
- **Write on record creation** — single write point per resource type, e.g. `after_create` callback in `Ci::Partitionable` so all current and future partitioned models get it automatically.
### Finder patch design
Introduce a generic `Ci::PartitionIdResolver` that resolves `partition_id` for any partitioned CI resource type:
```ruby
module Ci
module PartitionIdResolver
REDIS_TTL = 14.days.to_i
KEY_PREFIX = "ci:partition"
# resource_type: :pipeline, :build, etc.
def self.resolve(resource_type, record_id)
key = "#{KEY_PREFIX}:#{resource_type}:{#{record_id}}"
cached = Gitlab::Redis::SharedState.with { |r| r.get(key) }
return cached.to_i if cached
# Cold path: DB fallback — cross-partition scan but only on cache miss
model = model_for(resource_type)
record = model.find_by(id: record_id)
return nil unless record
# Backfill cache on miss
Gitlab::Redis::SharedState.with { |r| r.set(key, record.partition_id, ex: REDIS_TTL) }
record.partition_id
end
def self.model_for(resource_type)
{ pipeline: Ci::Pipeline, build: Ci::Build }.fetch(resource_type)
end
end
end
```
Patch the base finder via a concern included in `Ci::ApplicationRecord` (or `Ci::Partitionable`) so all partitioned models get it automatically:
```ruby
module Ci
module PartitionAwareFinder
extend ActiveSupport::Concern
class_methods do
# Resolves partition_id from Redis and injects it into the query.
# resource_type defaults to the model's own type (e.g. :build for Ci::Build),
# but can be overridden when the caller has a related resource id
# (e.g. looking up a build by pipeline_id).
def find_by_id_with_partition(id, resource_type: nil, resource_id: nil)
resource_type ||= inferred_resource_type
resource_id ||= id
partition_id = Ci::PartitionIdResolver.resolve(resource_type, resource_id)
if partition_id
find_by(id: id, partition_id: partition_id)
else
find_by(id: id) # graceful degradation
end
end
private
def inferred_resource_type
name.demodulize.underscore.to_sym # Ci::Build => :build, Ci::Pipeline => :pipeline
end
end
end
end
```
This means:
- `Ci::Pipeline.find_by_id_with_partition(pipeline_id)` — resolves via `ci:partition:pipeline:{pipeline_id}`
- `Ci::Build.find_by_id_with_partition(build_id)` — resolves via `ci:partition:build:{build_id}`
- `Ci::Build.find_by_id_with_partition(build_id, resource_type: :pipeline, resource_id: pipeline_id)` — resolves via the pipeline cache when the worker has a `pipeline_id` available (avoids a separate build cache entry)
- Any future partitioned CI model gets the same behaviour automatically by including `Ci::Partitionable`
**Known concerns with this approach** (discussed in the thread):
- Memory footprint: ~15 million new Redis keys/day at ~50 bytes each = ~750 MB/day, ~10 GB at 14-day TTL.
- `SharedState` does not have an eviction policy; memory exhaustion can cause stalled writes or downtime. Using `Cache` is safer but risks silent eviction degrading to full scans.
- Redis outage or excessive cache misses can cascade into database overload.
- Does not handle bulk lookups (preloads, `id_in` queries) without per-record round-trips.
</details>
## References
- Incident review: https://gitlab.com/gitlab-com/gl-infra/production/-/work_items/21534#note_3158996366
- CI partitioning design doc: https://handbook.gitlab.com/handbook/engineering/architecture/design-documents/ci_data_decay/pipeline_partitioning/
- `Ci::Deployable` unconnected pruning: https://gitlab.com/gitlab-org/gitlab/-/blob/duo-edit-20260311-163221/app/models/concerns/ci/deployable.rb?plain=1#L18
- `Ci::Partitionable` concern: https://gitlab.com/gitlab-org/gitlab/-/blob/920147293ae304639915f66b260dc14e4f629850/app/models/concerns/ci/partitionable.rb
- Partition spillover data: https://gitlab.com/gitlab-org/gitlab/-/work_items/593701#note_3177610978
issue