Add migration to mark duplicate npm packages for destruction
Context
In #390097 (closed) we eliminated the root cause of duplicate npm packages.
The last duplicated npm package was created on Apr 24, 2023
.
We have 460
duplicate npm packages that we want to mark for destruction.
The follow up issue #390099 (closed) is to add UNIQUE
constraint to database
What does this MR do and why?
Adds a background migration that will seek duplicated npm packages and mark them for destruction.
Database migrations
Migration up
$ rails db:migrate:up:main VERSION=20230524120241
main: == [advisory_lock_connection] object_id: 278280, pg_backend_pid: 34334
main: == 20230524120241 AddTempIndexToPackagesOnProjectIdWhenNpmAndNotPendingDestruction: migrating
main: -- transaction_open?()
main: -> 0.0000s
main: -- view_exists?(:postgres_partitions)
main: -> 0.0581s
main: -- index_exists?(:packages_packages, :project_id, {:name=>"tmp_idx_packages_on_project_id_when_npm_not_pending_destruction", :where=>"package_type = 2 AND status <> 4", :algorithm=>:concurrently})
main: -> 0.0097s
main: -- execute("SET statement_timeout TO 0")
main: -> 0.0004s
main: -- add_index(:packages_packages, :project_id, {:name=>"tmp_idx_packages_on_project_id_when_npm_not_pending_destruction", :where=>"package_type = 2 AND status <> 4", :algorithm=>:concurrently})
main: -> 0.0023s
main: -- execute("RESET statement_timeout")
main: -> 0.0003s
main: == 20230524120241 AddTempIndexToPackagesOnProjectIdWhenNpmAndNotPendingDestruction: migrated (0.1205s)
main: == [advisory_lock_connection] object_id: 278280, pg_backend_pid: 34334
$ rails db:migrate:up:main VERSION=20230524201454
main: == [advisory_lock_connection] object_id: 278280, pg_backend_pid: 34757
main: == 20230524201454 QueueMarkDuplicateNpmPackagesForDestruction: migrating ======
main: == 20230524201454 QueueMarkDuplicateNpmPackagesForDestruction: migrated (0.0935s)
main: == [advisory_lock_connection] object_id: 278280, pg_backend_pid: 34757
Migration down
$ rails db:migrate:down:main VERSION=20230524201454
main: == [advisory_lock_connection] object_id: 278360, pg_backend_pid: 33424
main: == 20230524201454 QueueMarkDuplicateNpmPackagesForDestruction: reverting ======
main: == 20230524201454 QueueMarkDuplicateNpmPackagesForDestruction: reverted (0.0618s)
main: == [advisory_lock_connection] object_id: 278360, pg_backend_pid: 33424
$ rails db:migrate:down:main VERSION=20230524120241
main: == [advisory_lock_connection] object_id: 278300, pg_backend_pid: 33906
main: == 20230524120241 AddTempIndexToPackagesOnProjectIdWhenNpmAndNotPendingDestruction: reverting
main: -- transaction_open?()
main: -> 0.0000s
main: -- view_exists?(:postgres_partitions)
main: -> 0.0641s
main: -- indexes(:packages_packages)
main: -> 0.0084s
main: -- execute("SET statement_timeout TO 0")
main: -> 0.0002s
main: -- remove_index(:packages_packages, {:algorithm=>:concurrently, :name=>"tmp_idx_packages_on_project_id_when_npm_not_pending_destruction"})
main: -> 0.0016s
main: -- execute("RESET statement_timeout")
main: -> 0.0003s
main: == 20230524120241 AddTempIndexToPackagesOnProjectIdWhenNpmAndNotPendingDestruction: reverted (0.0866s)
main: == [advisory_lock_connection] object_id: 278300, pg_backend_pid: 33906
Screenshots or screen recordings
No.
How to set up and validate locally
-
In rails console create five packages in two projects and make them duplicates.
def fixture_file_upload(*args, **kwargs) Rack::Test::UploadedFile.new(*args, **kwargs) end p1 = FactoryBot.create(:npm_package, name: 'test', version: '1.0.0', project: Project.first) p2 = FactoryBot.create(:npm_package, name: 'test', version: '1.0.1', project: Project.first) p3 = FactoryBot.create(:npm_package, name: 'test', version: '1.0.2', project: Project.first) p2.update_column(:version, '1.0.0') p3.update_column(:version, '1.0.0') p4 = FactoryBot.create(:npm_package, name: 'test', version: '1.0.0', project: Project.last) p5 = FactoryBot.create(:npm_package, name: 'test', version: '1.0.1', project: Project.last) p5.update_column(:version, '1.0.0')
Note down
ID
ofp3
andp5
, these packages should persist after we run the migration. -
Check that
sidekiq
is running:gdk status rails-background-jobs
-
Execute the migration:
rails db:migrate:main
-
Now we have to wait until the background migration is executed. It might take about
10-15
minutes.The status of background migrations might be checked here: http://gdk.test:3000/admin/background_migrations
-
Open rails console and check packages:
Packages::Package.where(name: 'test', version: '1.0.0', project_id: [Project.first.id, Project.last.id], status: 0) Packages::Package.where(name: 'test', version: '1.0.0', project_id: [Project.first.id, Project.last.id], status: 4)
Note: the first query should return the packages
p3
andp5
. The second query should return the packagesp1
,p2
,p4
that are marked for destruction.
MR acceptance checklist
This checklist encourages us to confirm any changes have been analyzed to reduce risks in quality, performance, reliability, security, and maintainability.
-
I have evaluated the MR acceptance checklist for this MR.
Related to #390098 (closed)