Skip to content

Add background migration to fix zero packages_sizes project statistics

David Fernandez requested to merge 378433-fix-zero-packages-size into master

🚀 Context

In the project statistics object, we have a metric called packages_size that is the aggregation (sum) of all the package files size.

In #363010 (closed), we discovered that some packages_size seemed to be "out of sync". For example, a package_size of 10MB but the sum of all package files would be 5KB. The presumed root cause of this desync is concurrent updates: the packages_size metric could be updated in a burst. During that burst race conditions could appear and make some updates getting lost.

To solve this, we switched the packages_size metric updates to a "buffered" counter. In very short words, instead of updating the database column directly we use a redis entry where we incrby. Then, once in a while (each 10 minutes max), a job is responsible to move (or "flush") the updates from redis to the database. This way, redis acts as a "protection layer" for the database.

That's all nice but what about the existing out of sync data? Well, we don't have a lot of choices here, we need background migrations to detect incoherent packages_size and fix them. That's Background migration to fix incorrect statistic... (#378433 - closed).

Given the amount/scale in play here, we are going to split the background migrations in two:

  1. Project statistics packages_size different than 0. That's !107799 (closed).
  2. Project statistics packages_size set to 0 and some package files exist. 👈 This is this MR

Given that we have 2 background migrations that work on different sets, we should merge this MR close to the !107799 (closed) 's merge.

🔬 What does this MR do and why?

  • Introduce a background migration to deal with incoherent and zero packages_size.
    • Only for project that have packages (obviously).
  • Introduce a temporary indexes to support that migration.
  • Add the related specs.

📺 Screenshots or screen recordings

None

How to set up and validate locally

Let's create some packages for the first project. In a rails console:

def fixture_file_upload(*args, **kwargs)
  Rack::Test::UploadedFile.new(*args, **kwargs)
end

10.times { FactoryBot.create(:npm_package, project: Project.first) }

Wait at least 10 minutes, so that updates to packages_size are flushed.

Now, check the original packages_size and update it to make it incoherent.

Project.first.statistics.packages_size # 4096000 note this.

Project.first.statistics.update!(packages_size: 0)

Let's run the migration:

rails db:migrate

or

$ rake db:migrate:down VERSION=20230105132223
$ rake db:migrate:up VERSION=20230105132223

Wait for at least 10 minutes and check the packages_size again. It should be fixed.

Project.first.statistics.packages_size # 4096000 as it was originally

The background migration is working as expected 🎉

🏎 MR acceptance checklist

This checklist encourages us to confirm any changes have been analyzed to reduce risks in quality, performance, reliability, security, and maintainability.

💾 Database review

Migration up

main: == 20230110125926 AddTemporaryZeroPackagesSizeIndexToProjectStatistics: migrating 
main: -- transaction_open?()
main:    -> 0.0000s
main: -- view_exists?(:postgres_partitions)
main:    -> 0.0070s
main: -- index_exists?(:project_statistics, [:id, :project_id], {:where=>"packages_size = 0", :name=>"tmp_idx_project_statistics_on_zero_packages_size", :algorithm=>:concurrently})
main:    -> 0.0075s
main: -- execute("SET statement_timeout TO 0")
main:    -> 0.0007s
main: -- add_index(:project_statistics, [:id, :project_id], {:where=>"packages_size = 0", :name=>"tmp_idx_project_statistics_on_zero_packages_size", :algorithm=>:concurrently})
main:    -> 0.0048s
main: -- execute("RESET statement_timeout")
main:    -> 0.0007s
main: == 20230110125926 AddTemporaryZeroPackagesSizeIndexToProjectStatistics: migrated (0.0397s) 

main: == 20230111125926 QueueFixIncorrectZeroPackagesSizeOnProjectStatistics: migrating 
main: == 20230111125926 QueueFixIncorrectZeroPackagesSizeOnProjectStatistics: migrated (0.0593s) 

Migration down

main: == 20230111125926 QueueFixIncorrectZeroPackagesSizeOnProjectStatistics: reverting 
main: == 20230111125926 QueueFixIncorrectZeroPackagesSizeOnProjectStatistics: reverted (0.0350s) 

main: == 20230110125926 AddTemporaryZeroPackagesSizeIndexToProjectStatistics: reverting 
main: -- transaction_open?()
main:    -> 0.0000s
main: -- view_exists?(:postgres_partitions)
main:    -> 0.1257s
main: -- indexes(:project_statistics)
main:    -> 0.0095s
main: -- execute("SET statement_timeout TO 0")
main:    -> 0.0006s
main: -- remove_index(:project_statistics, {:algorithm=>:concurrently, :name=>"tmp_idx_project_statistics_on_zero_packages_size"})
main:    -> 0.0041s
main: -- execute("RESET statement_timeout")
main:    -> 0.0007s
main: == 20230110125926 AddTemporaryZeroPackagesSizeIndexToProjectStatistics: reverted (0.2118s) 

📊 Queries analysis

🚀 Background migration execution time analysis for gitlab.com

We have 28252538 packages_size = 0 statistics. We are using a batch size of 200. We will need 141263 loops.

From the query analysis, we have around 150ms of execution time in queries.

That gives us:

141263 * 150ms + 141262 * 2000ms = 303713450ms = 303713.45s = 5061.89m = 84.36h = ~ 3.5 days

Edited by David Fernandez

Merge request reports