Performance improvements for gitlab:doctor:secrets (!190206) · Merge requests · GitLab.org / GitLab

What does this MR do and why?

Performance improvements for gitlab:doctor:secrets

Use each_batch instead of find_each for more efficient batching per https://docs.gitlab.com/development/database/iterating_tables_in_batches/.
Only check rows with a not-null encrypted column. This allows us to skip most CI::Build entries.

Credits to https://gitlab.com/mbobin for suggesting this change.

Changelog: performance

Background

The gitlab:doctor:secrets rake task becomes much slower as GitLab usage/data grows. This is mainly due to the number of Ci::Build records analysed by the task.

This MR aims to make the rake task more performant by skipping Ci::Build records with a NULL-token, which is the wast majority of records (the token is set to NULL once a CI job finishes).

References

Closes Investigate gitlab:doctor:secrets rake task spe... (#518702 - closed)

Performance Evaluation

Setup

Fresh Omnibus install.
Ubuntu 22.04 lts amd64, 4 vCPU, 16GB RAM (e2-standard-4 GCP VM).
GitLab 17.11.1

Methodology

Create a project and pipeline via UI.
Check that Ci::Build token_encrypted is nil after a job completes.
Duplicate and save a Ci::Build entry N times (via a rails console).
Measure (real) time with time sudo gitlab-rake gitlab:doctor:secrets.

Results

CI::Build.count	Runtime with unpatched `gitlab:doctor:secrets`	Runtime with patched `gitlab:doctor:secrets`	Improvement
10k	2m18s	1m	1m18s (~56%)
50k	9m19s	1m3s	8m16s (~88%)
100k	20m14s	1m17s	18m57s (~93%)

How to set up and validate locally

MR acceptance checklist

Evaluate this MR against the MR acceptance checklist. It helps you analyze changes to reduce risks in quality, performance, reliability, security, and maintainability.

Edited May 08, 2025 by Clemens Beck