Skip to content

Cleanup stale npm metadata cache using background worker

Context

In Introduce a model for npm metadata (#393633 - closed) we introduced a new entity Packages::Npm::MetadataCache that's used to operate and store npm Registry metadata cache. Every metadata cache entry contains a project_id that will be set to NULL when a linked project gets removed from the database In this MR we cleanup those stale entries using background workers.

What does this MR do and why?

This MR introduces a new background worker Packages::Npm::CleanupStaleMetadataCacheWorker to delete stale npm metadata cache entries.

The existing Packages::CleanupPackageRegistryWorker will enqueue Packages::Npm::CleanupStaleMetadataCacheWorker if there are any stale npm metadata cache entries.

Screenshots or screen recordings

No visible changes.

How to set up and validate locally

The following steps need to be executed in rails console.

  1. Prepare two metadata caches and make one of them stale (no project_id is set)

    def fixture_file_upload(*args, **kwargs)
      Rack::Test::UploadedFile.new(*args, **kwargs)
    end
    
    cache1 = FactoryBot.create(:npm_metadata_cache, project: Project.first)
    cache2 = FactoryBot.create(:npm_metadata_cache, project: Project.first)
    cache2.update_attribute(:project_id, nil)
  2. Make sure sidekiq is up and running otherwise gdk restart rails-background-jobs

    Packages::CleanupPackageRegistryWorker.new.perform
  3. After the worker has been executed, we should get the following results: a stale cache was deleted and a normal cache still exists.

    Packages::Npm::MetadataCache.find(cache1.id)
    => #<Packages::Npm::MetadataCache ...
    
    Packages::Npm::MetadataCache.find(cache2.id)
    => ActiveRecord::RecordNotFound: Couldn't find Packages::Npm::MetadataCache

Migrations output

$ rails db:migrate:up:main VERSION=20231019145202

main: == [advisory_lock_connection] object_id: 226760, pg_backend_pid: 72298
main: == 20231019145202 AddStatusToPackagesNpmMetadataCaches: migrating =============
main: -- add_column(:packages_npm_metadata_caches, :status, :integer, {:default=>0, :null=>false, :limit=>2})
main:    -> 0.0019s
main: == 20231019145202 AddStatusToPackagesNpmMetadataCaches: migrated (0.0054s) ====

main: == [advisory_lock_connection] object_id: 226760, pg_backend_pid: 72298

$ rails db:migrate:down:main VERSION=20231019145202

main: == [advisory_lock_connection] object_id: 226820, pg_backend_pid: 71895
main: == 20231019145202 AddStatusToPackagesNpmMetadataCaches: reverting =============
main: -- remove_column(:packages_npm_metadata_caches, :status, :integer, {:default=>0, :null=>false, :limit=>2})
main:    -> 0.0014s
main: == 20231019145202 AddStatusToPackagesNpmMetadataCaches: reverted (0.0057s) ====

main: == [advisory_lock_connection] object_id: 226820, pg_backend_pid: 71895

$ rails db:migrate:up:main VERSION=20231020181652

main: == [advisory_lock_connection] object_id: 226760, pg_backend_pid: 8416
main: == 20231020181652 AddIndexPackagesNpmMetadataCachesOnIdAndProjectIdAndStatus: migrating 
main: -- transaction_open?()
main:    -> 0.0000s
main: -- view_exists?(:postgres_partitions)
main:    -> 0.0780s
main: -- index_exists?(:packages_npm_metadata_caches, :id, {:name=>"idx_pkgs_npm_metadata_caches_on_id_and_project_id_and_status", :where=>"project_id IS NULL AND status = 0", :algorithm=>:concurrently})
main:    -> 0.0027s
main: -- execute("SET statement_timeout TO 0")
main:    -> 0.0001s
main: -- add_index(:packages_npm_metadata_caches, :id, {:name=>"idx_pkgs_npm_metadata_caches_on_id_and_project_id_and_status", :where=>"project_id IS NULL AND status = 0", :algorithm=>:concurrently})
main:    -> 0.0014s
main: -- execute("RESET statement_timeout")
main:    -> 0.0001s
main: == 20231020181652 AddIndexPackagesNpmMetadataCachesOnIdAndProjectIdAndStatus: migrated (0.1646s) 

main: == [advisory_lock_connection] object_id: 226760, pg_backend_pid: 8416

$ rails db:migrate:down:main VERSION=20231020181652

main: == [advisory_lock_connection] object_id: 226860, pg_backend_pid: 7115
main: == 20231020181652 AddIndexPackagesNpmMetadataCachesOnIdAndProjectIdAndStatus: reverting 
main: -- transaction_open?()
main:    -> 0.0000s
main: -- view_exists?(:postgres_partitions)
main:    -> 0.0575s
main: -- indexes(:packages_npm_metadata_caches)
main:    -> 0.0030s
main: -- execute("SET statement_timeout TO 0")
main:    -> 0.0001s
main: -- remove_index(:packages_npm_metadata_caches, {:algorithm=>:concurrently, :name=>"idx_pkgs_npm_metadata_caches_on_id_and_project_id_and_status"})
main:    -> 0.0016s
main: -- execute("RESET statement_timeout")
main:    -> 0.0001s
main: == 20231020181652 AddIndexPackagesNpmMetadataCachesOnIdAndProjectIdAndStatus: reverted (0.0741s) 

main: == [advisory_lock_connection] object_id: 226860, pg_backend_pid: 7115

MR acceptance checklist

This checklist encourages us to confirm any changes have been analyzed to reduce risks in quality, performance, reliability, security, and maintainability.

Related to #396406 (closed)

Edited by Dzmitry (Dima) Meshcharakou

Merge request reports