Skip to content

Backfill dependency proxy size in namespace stats

Vijay Hawoldar requested to merge vij-backfill-dp-stats into master

What does this MR do and why?

  • Adds a post deploy migration to backfill NamespaceStatistics#dependency_proxy_size which will update the related Namespace::RootStorageStatistics when applicable (if NamespaceStatistics#storage_size changes).
  • Moves some EE logic to CE class. NamespaceStatistics now has dependency_proxy_size which is available in CE, so we need to allow for backfilling stats for non-EE now too

Refs https://gitlab.com/gitlab-org/gitlab/-/issues/352853

Query

The query being run in the migration is:

  SELECT dependency_proxy_manifests.group_id FROM dependency_proxy_manifests
  UNION
  SELECT dependency_proxy_blobs.group_id from dependency_proxy_blobs

The cold timings in database-lab are:

Time: 1.238 s
  - planning: 1.229 ms
  - execution: 1.237 s
    - I/O read: 1.039 s
    - I/O write: 0.000 ms

Shared buffers:
  - hits: 3127 (~24.40 MiB) from the buffer pool
  - reads: 1645 (~12.90 MiB) from the OS file cache, including disk I/O
  - dirtied: 137 (~1.10 MiB)
  - writes: 0

https://postgres.ai/console/gitlab/gitlab-production-tunnel-pg12/sessions/9179/commands/32576

The warm timings in database-lab are:

Time: 154.688 ms
  - planning: 0.149 ms
  - execution: 154.539 ms
    - I/O read: 0.000 ms
    - I/O write: 0.000 ms

Shared buffers:
  - hits: 4732 (~37.00 MiB) from the buffer pool
  - reads: 0 from the OS file cache, including disk I/O
  - dirtied: 0
  - writes: 0

https://postgres.ai/console/gitlab/gitlab-production-tunnel-pg12/sessions/9179/commands/32577

The queries being run by the underlying migration / service are unchanged from the original implementation (!55487 (merged)), except they will now also lookup/refresh dependency proxy artifacts as well (added in !79657 (merged)):

SELECT SUM("dependency_proxy_manifests"."size") FROM "dependency_proxy_manifests" WHERE "dependency_proxy_manifests"."group_id" = 184
SELECT SUM("dependency_proxy_blobs"."size") FROM "dependency_proxy_blobs" WHERE "dependency_proxy_blobs"."group_id" = 184 

Processing the backfill for 100 groups locally took 0.8s, so with a batch size of 500, each job will take ~4s.

At the moment, the initial query returns approximately 1388 rows/groups in db-lab (if I'm reading the results correctly), so we can expect something like 3 jobs, with a total time of approx 4.2 minutes

MR acceptance checklist

This checklist encourages us to confirm any changes have been analyzed to reduce risks in quality, performance, reliability, security, and maintainability.

Edited by Vijay Hawoldar

Merge request reports