Fix registry race condition
🐬 Context
There are times when users face a race condition when pushing a new image to the registry. When an image is pushed, multiple requests are made to the registry, and with each request, a ContainerRepository
record is created if one does not exist. We previously tried to solve this by introducing safe_find_or_create_by!
when a new repository record is created, however, that did not fix the problem.
The problem is we have a uniqueness model constraint on [:project_id, :name]
. safe_find_or_create_by!
runs an additional record.validate!
, which will throw a ActiveRecord::RecordInvalid
error. If we switch to safe_find_or_create_by
, we skip the validation, and an attempt to create a duplicate record will hit the UNIQUE
constraint at the database level. This will throw a ActiveRecord::RecordNotUnique
which safe_find_or_create_by
does rescue.
🎱 What does this MR do and why?
To fix the problem, we replace .safe_find_or_create_by!
with .safe_find_or_create_by
which then avoids the model validation errors created by the race condition.
I've also renamed the method being updated since it's behavior also includes finding.
To test that the race condition is gone, I've added a test based on the script in the below section How to set up and validate locally
. We don't have an expectation that tests an exception does not occur, however, if an exception is raised does occur, the test will fail.
📽 Screenshots or screen recordings
See section below for the script being used to test this update.
Before the change:
→ be rake gitlab:packages:container_repository_race_condition
Done
Done
Done
Done
Done
Done
Done
Kaboom! #<ActiveRecord::RecordInvalid: Validation failed: Name has already been taken>
Kaboom! #<ActiveRecord::RecordInvalid: Validation failed: Name has already been taken>
Kaboom! #<ActiveRecord::RecordInvalid: Validation failed: Name has already been taken>
After the change:
→ be rake gitlab:packages:container_repository_race_condition
Done
Done
Done
Done
Done
Done
Done
Done
Done
Done
🖍 How to set up and validate locally
Unfortunately, while the race condition is somewhat easily reproduced on GitLab.com, it is not easily reproduced locally since the environment is simpler (no load balancing, distributed instances, etc...).
Using the script laid out in https://blog.arkency.com/2015/09/testing-race-conditions/ (thanks to @10io for finding this and coming up with the script below!), we are able to recreate the race condition calling the method in question directly.
-
Set up a script in
lib/tasks/gitlab/packages/race_condition.rake
(or wherever you'd like to run the script from):# frozen_string_literal: true desc "GitLab | Packages | Container Repository race condition" namespace :gitlab do namespace :packages do task container_repository_race_condition: :environment do project = Project.find(1) path = project.full_path + '/some/image' path = ContainerRegistry::Path.new(path) wait_for_it = true threads = Array.new(10) do |i| Thread.new do true while wait_for_it begin ::ContainerRepository.create_from_path!(path) puts 'Done' rescue => e puts "Kaboom! #{e.inspect}" end end end wait_for_it = false threads.each(&:join) end end end
-
Ensure you have a Project with an ID of 1.
-
Run the script:
bundle exec rake gitlab:packages:container_repository_race_condition
-
You should only see
Done
in the output,Kaboom!
should not appear. -
If you want to re-run the script, you will need to either delete the container repository that was created (
ContainerRepository.last.destroy
), or change the path in the script before re-running.
☑ MR acceptance checklist
This checklist encourages us to confirm any changes have been analyzed to reduce risks in quality, performance, reliability, security, and maintainability.
-
I have evaluated the MR acceptance checklist for this MR.
Related to #345598 (closed)