Skip to content

Backfill project statistics with recent objects size

Vijay Hawoldar requested to merge vij-backfill-recent-size into master

What does this MR do and why?

We need to backfill ProjectStatistics records to recalculate repository storage size using the recent objects size introduced as part of Use repository recent objects size for project ... (#419903 - closed)

In this MR we achieve that by:

  1. looking up ProjectStatistics that have a repository_size > 0

  2. if the migration is running on .com, we also filter by updated_at < 2023-08-08

    We started using recent object size in production code on 2023-08-08, so any statistics updated after that won't need to be refreshed

  3. re-calculating the storage_size based on the component parts that make up the value, but with the new repository_size

  4. trigger the async worker that handles the RootStorageStatistics update (that class sums the related project_statistic#storage_size values, so the total storage size will also be updated accordingly)

We need to have an EE override for this background migration to handle a nuance with ProjectStatistics that means we also need to exclude upload_size when calculating the new total.

We have some previous/recent examples of backfills for project statistics:

Database info

Self-Managed batch query

explain SELECT "project_statistics"."project_id" FROM "project_statistics" WHERE "project_statistics"."project_id" BETWEEN 28026969 AND 28029237 AND (repository_size > 0) ORDER BY "project_statistics"."project_id"
Time: 8.001 ms  
  - planning: 0.499 ms  
  - execution: 7.502 ms  
    - I/O read: 0.000 ms  
    - I/O write: 0.000 ms  
  
Shared buffers:  
  - hits: 1150 (~9.00 MiB) from the buffer pool  
  - reads: 0 from the OS file cache, including disk I/O  
  - dirtied: 0  
  - writes: 0  

https://postgres.ai/console/gitlab/gitlab-production-tunnel-pg12/sessions/21658/commands/70363

No index added as one already exists for this query

SaaS batch query

SELECT "project_statistics"."project_id"
FROM "project_statistics"
WHERE "project_statistics"."project_id" BETWEEN 28026969 AND 28029237 AND (repository_size > 0 and updated_at < '2023-08-08') ORDER BY "project_statistics"."project_id"
Time: 7.761 ms  
  - planning: 0.411 ms  
  - execution: 7.350 ms  
    - I/O read: 0.000 ms  
    - I/O write: 0.000 ms  
  
Shared buffers:  
  - hits: 1150 (~9.00 MiB) from the buffer pool  
  - reads: 0 from the OS file cache, including disk I/O  
  - dirtied: 0  
  - writes: 0  

https://postgres.ai/console/gitlab/gitlab-production-tunnel-pg12/sessions/21658/commands/70364

Screenshots or screen recordings

Screenshots are required for UI changes, and strongly recommended for all other merge requests.

Before After

How to set up and validate locally

Numbered steps to set up and validate the change are strongly suggested.

MR acceptance checklist

This checklist encourages us to confirm any changes have been analyzed to reduce risks in quality, performance, reliability, security, and maintainability.

Edited by Vijay Hawoldar

Merge request reports