Skip to content

Fixed backfill code to burst root_namespace cache

Suraj Tripathi requested to merge issue_368112_namespace_usage_cache_sync into master

What does this MR do and why?

The original migration iterated over all the "migrated" projects from container_repository tables which had container_registr_size as 0 in the corresponding project_statistics table, and once the project_statistics was updated we triggered an update of the root_storage_statistics for the corresponding project using the Scheduler.

The "migrated" part is relevant because there is a parallel activity that is going on, which is migrating container_repositories(independent to this or the old backfill which is mentioned in this MR).

Since the last run of the backfill, there have been more projects that have been "migrated", which would still have self.container_registry_size == 0, for them we would get the container_registry_size via project.container_repositories_size API, for the old ones which were already migrated, we still want to Burst the cache and trigger a RootStorageStatistics update.

This MR:

  • Fixes cache burst during backfill of container_registry_size for ProjectStatistics and NamespaceStatistics
  • Also fixes logs of backfill

ref: https://gitlab.com/gitlab-org/gitlab/-/issues/368112

PS: This is a rerun of the previously run Background migration: !89865 (merged), with minor changes

Screenshots or screen recordings

These are strongly recommended to assist reviewers and reduce the time to merge your change.

How to set up and validate locally

Numbers

  • Total Number of projects which have ContainerRepositories : 789471 (source)
    • Total Projects where ContainerRegistrySize is 0 in ProjectStats: 115826 (source)
    • Total Projects where ContainerRegistrySize non 0 in ProjectStats: 673645
  • Total Namespaces for the corresponding Projects: 340113 (source)

Query Plan

  1. Query to get distinct project_ids for a batch.. Using sub_batch_size of 100
SELECT DISTINCT "container_repositories"."project_id" FROM "container_repositories" WHERE "container_repositories"."project_id" BETWEEN 3806 AND 628939 AND ("container_repositories"."created_at" >= '2022-01-23' OR "container_repositories"."migration_state" = 'import_done') AND "container_repositories"."project_id" >= 3806 AND "container_repositories"."project_id" < 192676

Query Plan:

 Unique  (cost=0.43..98.79 rows=853 width=4) (actual time=0.280..1.563 rows=100 loops=1)
   Buffers: shared hit=84 read=5 dirtied=18
   I/O Timings: read=0.188 write=0.000
   ->  Index Only Scan using tmp_index_migrated_container_registries on public.container_repositories  (cost=0.43..96.65 rows=855 width=4) (actual time=0.279..1.421 rows=1059 loops=1)
         Index Cond: ((container_repositories.project_id >= 3806) AND (container_repositories.project_id <= 628939) AND (container_repositories.project_id >= 3806) AND (container_repositories.project_id < 192676))
         Heap Fetches: 59
         Buffers: shared hit=84 read=5 dirtied=18
         I/O Timings: read=0.188 write=0.000

Plan: https://console.postgres.ai/gitlab/gitlab-production-tunnel-pg12/sessions/10888/commands/39058

  1. Query for ProjectStatistics.where(project_id: sub_batch)
SELECT "project_statistics".* FROM "project_statistics" WHERE "project_statistics"."project_id" IN (SELECT DISTINCT "container_repositories"."project_id" FROM "container_repositories" WHERE "container_repositories"."project_id" BETWEEN 3806 AND 628939 AND ("container_repositories"."created_at" >= '2022-01-23' OR "container_repositories"."migration_state" = 'import_done') AND "container_repositories"."project_id" >= 3806 AND "container_repositories"."project_id" < 192676)

Query Plan:

 Nested Loop  (cost=0.99..3467.81 rows=956 width=116) (actual time=6.280..190.691 rows=100 loops=1)
   Buffers: shared hit=350 read=213 dirtied=26
   I/O Timings: read=183.492 write=0.000
   ->  Unique  (cost=0.43..40.27 rows=956 width=4) (actual time=2.543..27.193 rows=100 loops=1)
         Buffers: shared hit=25 read=35 dirtied=8
         I/O Timings: read=26.165 write=0.000
         ->  Index Only Scan using tmp_index_migrated_container_registries on public.container_repositories  (cost=0.43..37.87 rows=958 width=4) (actual time=2.540..26.860 rows=1101 loops=1)
               Index Cond: ((container_repositories.project_id >= 3806) AND (container_repositories.project_id <= 628939) AND (container_repositories.project_id >= 3806) AND (container_repositories.project_id < 192676))
               Heap Fetches: 30
               Buffers: shared hit=25 read=35 dirtied=8
               I/O Timings: read=26.165 write=0.000
   ->  Index Scan using index_project_statistics_on_project_id on public.project_statistics  (cost=0.56..3.58 rows=1 width=116) (actual time=1.631..1.631 rows=1 loops=100)
         Index Cond: (project_statistics.project_id = container_repositories.project_id)
         Buffers: shared hit=323 read=178 dirtied=16
         I/O Timings: read=157.327 write=0.000

Plan:

https://console.postgres.ai/shared/5f6b3815-472d-47f5-bb0e-34db8a5f4b08

MR acceptance checklist

This checklist encourages us to confirm any changes have been analyzed to reduce risks in quality, performance, reliability, security, and maintainability.

Edited by Suraj Tripathi

Merge request reports