Skip to content

Add migration to mark duplicate npm packages for destruction

Context

In #390097 (closed) we eliminated the root cause of duplicate npm packages.

The last duplicated npm package was created on Apr 24, 2023.

We have 460 duplicate npm packages that we want to mark for destruction.

The follow up issue #390099 (closed) is to add UNIQUE constraint to database

What does this MR do and why?

Adds a background migration that will seek duplicated npm packages and mark them for destruction.

Database migrations

Migration up

$ rails db:migrate:up:main VERSION=20230524120241
main: == [advisory_lock_connection] object_id: 278280, pg_backend_pid: 34334
main: == 20230524120241 AddTempIndexToPackagesOnProjectIdWhenNpmAndNotPendingDestruction: migrating 
main: -- transaction_open?()
main:    -> 0.0000s
main: -- view_exists?(:postgres_partitions)
main:    -> 0.0581s
main: -- index_exists?(:packages_packages, :project_id, {:name=>"tmp_idx_packages_on_project_id_when_npm_not_pending_destruction", :where=>"package_type = 2 AND status <> 4", :algorithm=>:concurrently})
main:    -> 0.0097s
main: -- execute("SET statement_timeout TO 0")
main:    -> 0.0004s
main: -- add_index(:packages_packages, :project_id, {:name=>"tmp_idx_packages_on_project_id_when_npm_not_pending_destruction", :where=>"package_type = 2 AND status <> 4", :algorithm=>:concurrently})
main:    -> 0.0023s
main: -- execute("RESET statement_timeout")
main:    -> 0.0003s
main: == 20230524120241 AddTempIndexToPackagesOnProjectIdWhenNpmAndNotPendingDestruction: migrated (0.1205s) 

main: == [advisory_lock_connection] object_id: 278280, pg_backend_pid: 34334
$ rails db:migrate:up:main VERSION=20230524201454
main: == [advisory_lock_connection] object_id: 278280, pg_backend_pid: 34757
main: == 20230524201454 QueueMarkDuplicateNpmPackagesForDestruction: migrating ======
main: == 20230524201454 QueueMarkDuplicateNpmPackagesForDestruction: migrated (0.0935s) 

main: == [advisory_lock_connection] object_id: 278280, pg_backend_pid: 34757

Migration down

$ rails db:migrate:down:main VERSION=20230524201454
main: == [advisory_lock_connection] object_id: 278360, pg_backend_pid: 33424
main: == 20230524201454 QueueMarkDuplicateNpmPackagesForDestruction: reverting ======
main: == 20230524201454 QueueMarkDuplicateNpmPackagesForDestruction: reverted (0.0618s) 

main: == [advisory_lock_connection] object_id: 278360, pg_backend_pid: 33424
$ rails db:migrate:down:main VERSION=20230524120241
main: == [advisory_lock_connection] object_id: 278300, pg_backend_pid: 33906
main: == 20230524120241 AddTempIndexToPackagesOnProjectIdWhenNpmAndNotPendingDestruction: reverting 
main: -- transaction_open?()
main:    -> 0.0000s
main: -- view_exists?(:postgres_partitions)
main:    -> 0.0641s
main: -- indexes(:packages_packages)
main:    -> 0.0084s
main: -- execute("SET statement_timeout TO 0")
main:    -> 0.0002s
main: -- remove_index(:packages_packages, {:algorithm=>:concurrently, :name=>"tmp_idx_packages_on_project_id_when_npm_not_pending_destruction"})
main:    -> 0.0016s
main: -- execute("RESET statement_timeout")
main:    -> 0.0003s
main: == 20230524120241 AddTempIndexToPackagesOnProjectIdWhenNpmAndNotPendingDestruction: reverted (0.0866s) 

main: == [advisory_lock_connection] object_id: 278300, pg_backend_pid: 33906

Screenshots or screen recordings

No.

How to set up and validate locally

  1. In rails console create five packages in two projects and make them duplicates.

    def fixture_file_upload(*args, **kwargs)
      Rack::Test::UploadedFile.new(*args, **kwargs)
    end
    
    p1 = FactoryBot.create(:npm_package, name: 'test', version: '1.0.0', project: Project.first)
    p2 = FactoryBot.create(:npm_package, name: 'test', version: '1.0.1', project: Project.first)
    p3 = FactoryBot.create(:npm_package, name: 'test', version: '1.0.2', project: Project.first)
    
    p2.update_column(:version, '1.0.0')
    p3.update_column(:version, '1.0.0')
    
    p4 = FactoryBot.create(:npm_package, name: 'test', version: '1.0.0', project: Project.last)
    p5 = FactoryBot.create(:npm_package, name: 'test', version: '1.0.1', project: Project.last)
    
    p5.update_column(:version, '1.0.0')

    Note down ID of p3 and p5, these packages should persist after we run the migration.

  2. Check that sidekiq is running:

    gdk status rails-background-jobs
  3. Execute the migration:

    rails db:migrate:main
  4. Now we have to wait until the background migration is executed. It might take about 10-15 minutes.

    The status of background migrations might be checked here: http://gdk.test:3000/admin/background_migrations

  5. Open rails console and check packages:

    Packages::Package.where(name: 'test', version: '1.0.0', project_id: [Project.first.id, Project.last.id], status: 0)
    Packages::Package.where(name: 'test', version: '1.0.0', project_id: [Project.first.id, Project.last.id], status: 4)

    Note: the first query should return the packages p3 and p5. The second query should return the packages p1, p2, p4 that are marked for destruction.

MR acceptance checklist

This checklist encourages us to confirm any changes have been analyzed to reduce risks in quality, performance, reliability, security, and maintainability.

Related to #390098 (closed)

Edited by Dzmitry (Dima) Meshcharakou

Merge request reports