Geo: Create geo_ci_job_artifact_verification_summaries table
What does this MR do and why?
Adds geo_ci_job_artifact_verification_summaries table for improved metrics counting.
Problem
Geo currently collects job artifact verification metrics by scanning the entire ci_job_artifact_states table every minute. This approach doesn't scale for tables with 100M+ rows, causing timeouts and excessive database load.
Solution
This MR introduces a bucketed verification summaries table as the foundation for a more scalable metrics collection approach. Instead of counting the full table, we'll pre-calculate counts per bucket and only recalculate buckets that have changed.
How it works:
- Job artifacts are assigned to buckets using
job_artifact_id % 100,000 - Each bucket stores pre-calculated
total_count,verified_count, andfailed_count - Buckets have a state machine:
clean→dirty→calculating→clean - A future DB trigger (MR2/3) will mark buckets as
dirtywhen artifacts change - A calculator worker will recalculate only dirty buckets
Why 100k buckets:
- With 100M artifacts, each bucket averages ~1000 rows - fast to recount
- Modulo distribution spreads new records evenly (unlike range-based which creates hotspots)
- Bucket count is fixed; no need to add buckets as data grows
Why CI database (gitlab_ci_cell_local):
- The source table
ci_job_artifact_statesis on the CI database - PostgreSQL triggers cannot write across databases
- Model inherits from
Ci::ApplicationRecordto route correctly
This is MR 1 of 3. Follow-up MRs will:
- MR 2 - add the backfill service & calculator worker (behind FF) - #590853 (closed)
- MR 3- DB trigger, and feature-flagged switch in the replicator - #590854
Query Plans database
Seeded data in postgres.ai (CI database):
Created the table, indexes, and sequence matching the migration, then seeded 100k rows across 3 states:
- 90k clean (state = 0)
- 5k dirty (state = 1)
- 5k calculating (state = 2)
Raw SQL and query plans:
Dirty scope:
SELECT "geo_ci_job_artifact_verification_summaries".*
FROM "geo_ci_job_artifact_verification_summaries"
WHERE "geo_ci_job_artifact_verification_summaries"."state" = 1
ORDER BY "geo_ci_job_artifact_verification_summaries"."state_changed_at" ASC;
Plan (1.6ms execution): https://postgres.ai/console/gitlab/gitlab-production-ci/sessions/49285/commands/147088
Calculating scope:
SELECT "geo_ci_job_artifact_verification_summaries".*
FROM "geo_ci_job_artifact_verification_summaries"
WHERE "geo_ci_job_artifact_verification_summaries"."state" = 2;
Plan (0.8ms execution): https://postgres.ai/console/gitlab/gitlab-production-ci/sessions/49285/commands/147089
Table size/growth:
- Fixed maximum of 100k rows (one per bucket) (data will be backfilled in follow-up issue - #590853 (closed))
- Reads: periodic queries for dirty/calculating buckets + SUM aggregations for metrics collection - follow-up issue - #590853 (closed)
- Writes: triggered by verification state changes on
ci_job_artifact_states, plus calculator worker updates - follow-up issue - #590854 - Availability risk: low - small fixed-size table
How to set up and validate locally
- Ensure your GDK is set up and running
- Run the migration:
cd gitlab && bin/rails db:migrate - Verify the table exists:
Expected output:
bin/rails runner "puts Geo::CiJobArtifactVerificationSummary.table_exists?"true - Run the specs:
Expected output: 6 examples, 0 failures
bin/rspec ee/spec/models/geo/ci_job_artifact_verification_summary_spec.rb
MR acceptance checklist
Evaluate this MR against the MR acceptance checklist. It helps you analyze changes to reduce risks in quality, performance, reliability, security, and maintainability.
Related to #590852 (closed)