Fixed backfill code to burst root_namespace cache
What does this MR do and why?
The original migration iterated over all the "migrated" projects from container_repository
tables which had container_registr_size
as 0 in the corresponding project_statistics
table, and once the project_statistics was updated we triggered an update of the root_storage_statistics
for the corresponding project
using the Scheduler.
The "migrated" part is relevant because there is a parallel activity that is going on, which is migrating container_repositories
(independent to this or the old backfill which is mentioned in this MR).
Since the last run of the backfill, there have been more projects that have been "migrated", which would still have self.container_registry_size == 0
, for them we would get the container_registry_size
via project.container_repositories_size
API, for the old ones which were already migrated, we still want to Burst the cache and trigger a RootStorageStatistics update.
This MR:
- Fixes cache burst during backfill of container_registry_size for ProjectStatistics and NamespaceStatistics
- Also fixes logs of backfill
ref: https://gitlab.com/gitlab-org/gitlab/-/issues/368112
PS: This is a rerun of the previously run Background migration: !89865 (merged), with minor changes
Screenshots or screen recordings
These are strongly recommended to assist reviewers and reduce the time to merge your change.
How to set up and validate locally
Numbers
- Total Number of projects which have ContainerRepositories : 789471 (source)
- Total Projects where ContainerRegistrySize is 0 in ProjectStats: 115826 (source)
- Total Projects where ContainerRegistrySize non 0 in ProjectStats: 673645
- Total Namespaces for the corresponding Projects: 340113 (source)
Query Plan
- Query to get distinct project_ids for a batch.. Using sub_batch_size of 100
SELECT DISTINCT "container_repositories"."project_id" FROM "container_repositories" WHERE "container_repositories"."project_id" BETWEEN 3806 AND 628939 AND ("container_repositories"."created_at" >= '2022-01-23' OR "container_repositories"."migration_state" = 'import_done') AND "container_repositories"."project_id" >= 3806 AND "container_repositories"."project_id" < 192676
Query Plan:
Unique (cost=0.43..98.79 rows=853 width=4) (actual time=0.280..1.563 rows=100 loops=1)
Buffers: shared hit=84 read=5 dirtied=18
I/O Timings: read=0.188 write=0.000
-> Index Only Scan using tmp_index_migrated_container_registries on public.container_repositories (cost=0.43..96.65 rows=855 width=4) (actual time=0.279..1.421 rows=1059 loops=1)
Index Cond: ((container_repositories.project_id >= 3806) AND (container_repositories.project_id <= 628939) AND (container_repositories.project_id >= 3806) AND (container_repositories.project_id < 192676))
Heap Fetches: 59
Buffers: shared hit=84 read=5 dirtied=18
I/O Timings: read=0.188 write=0.000
Plan: https://console.postgres.ai/gitlab/gitlab-production-tunnel-pg12/sessions/10888/commands/39058
- Query for
ProjectStatistics.where(project_id: sub_batch)
SELECT "project_statistics".* FROM "project_statistics" WHERE "project_statistics"."project_id" IN (SELECT DISTINCT "container_repositories"."project_id" FROM "container_repositories" WHERE "container_repositories"."project_id" BETWEEN 3806 AND 628939 AND ("container_repositories"."created_at" >= '2022-01-23' OR "container_repositories"."migration_state" = 'import_done') AND "container_repositories"."project_id" >= 3806 AND "container_repositories"."project_id" < 192676)
Query Plan:
Nested Loop (cost=0.99..3467.81 rows=956 width=116) (actual time=6.280..190.691 rows=100 loops=1)
Buffers: shared hit=350 read=213 dirtied=26
I/O Timings: read=183.492 write=0.000
-> Unique (cost=0.43..40.27 rows=956 width=4) (actual time=2.543..27.193 rows=100 loops=1)
Buffers: shared hit=25 read=35 dirtied=8
I/O Timings: read=26.165 write=0.000
-> Index Only Scan using tmp_index_migrated_container_registries on public.container_repositories (cost=0.43..37.87 rows=958 width=4) (actual time=2.540..26.860 rows=1101 loops=1)
Index Cond: ((container_repositories.project_id >= 3806) AND (container_repositories.project_id <= 628939) AND (container_repositories.project_id >= 3806) AND (container_repositories.project_id < 192676))
Heap Fetches: 30
Buffers: shared hit=25 read=35 dirtied=8
I/O Timings: read=26.165 write=0.000
-> Index Scan using index_project_statistics_on_project_id on public.project_statistics (cost=0.56..3.58 rows=1 width=116) (actual time=1.631..1.631 rows=1 loops=100)
Index Cond: (project_statistics.project_id = container_repositories.project_id)
Buffers: shared hit=323 read=178 dirtied=16
I/O Timings: read=157.327 write=0.000
Plan:
https://console.postgres.ai/shared/5f6b3815-472d-47f5-bb0e-34db8a5f4b08
MR acceptance checklist
This checklist encourages us to confirm any changes have been analyzed to reduce risks in quality, performance, reliability, security, and maintainability.
-
I have evaluated the MR acceptance checklist for this MR.