Backfill dependency proxy size in namespace stats
What does this MR do and why?
- Adds a post deploy migration to backfill
NamespaceStatistics#dependency_proxy_size
which will update the relatedNamespace::RootStorageStatistics
when applicable (ifNamespaceStatistics#storage_size
changes). - Moves some EE logic to CE class.
NamespaceStatistics
now hasdependency_proxy_size
which is available in CE, so we need to allow for backfilling stats for non-EE now too
Refs https://gitlab.com/gitlab-org/gitlab/-/issues/352853
Query
The query being run in the migration is:
SELECT dependency_proxy_manifests.group_id FROM dependency_proxy_manifests
UNION
SELECT dependency_proxy_blobs.group_id from dependency_proxy_blobs
The cold timings in database-lab are:
Time: 1.238 s
- planning: 1.229 ms
- execution: 1.237 s
- I/O read: 1.039 s
- I/O write: 0.000 ms
Shared buffers:
- hits: 3127 (~24.40 MiB) from the buffer pool
- reads: 1645 (~12.90 MiB) from the OS file cache, including disk I/O
- dirtied: 137 (~1.10 MiB)
- writes: 0
https://postgres.ai/console/gitlab/gitlab-production-tunnel-pg12/sessions/9179/commands/32576
The warm timings in database-lab are:
Time: 154.688 ms
- planning: 0.149 ms
- execution: 154.539 ms
- I/O read: 0.000 ms
- I/O write: 0.000 ms
Shared buffers:
- hits: 4732 (~37.00 MiB) from the buffer pool
- reads: 0 from the OS file cache, including disk I/O
- dirtied: 0
- writes: 0
https://postgres.ai/console/gitlab/gitlab-production-tunnel-pg12/sessions/9179/commands/32577
The queries being run by the underlying migration / service are unchanged from the original implementation (!55487 (merged)), except they will now also lookup/refresh dependency proxy artifacts as well (added in !79657 (merged)):
SELECT SUM("dependency_proxy_manifests"."size") FROM "dependency_proxy_manifests" WHERE "dependency_proxy_manifests"."group_id" = 184
SELECT SUM("dependency_proxy_blobs"."size") FROM "dependency_proxy_blobs" WHERE "dependency_proxy_blobs"."group_id" = 184
Processing the backfill for 100 groups locally took 0.8s, so with a batch size of 500, each job will take ~4s.
At the moment, the initial query returns approximately 1388 rows/groups in db-lab (if I'm reading the results correctly), so we can expect something like 3 jobs, with a total time of approx 4.2 minutes
MR acceptance checklist
This checklist encourages us to confirm any changes have been analyzed to reduce risks in quality, performance, reliability, security, and maintainability.
-
I have evaluated the MR acceptance checklist for this MR.