Skip to content

Rewrite npm dist_tags retrieval query to load all tags

Moaz Khalifa requested to merge 436099-remove-NPM-distribution-tags-limit into master

Context

If an NPM package has more than 200 tags, the use of distribution tags from the NPM CLI ceases to work for any additional tags.

For example, consider the package: @scope/test_package with a version of 1.0.0. You have published 200 varying tags for this package version. You then publish @scope/test_package with a version of 2.0.0. You publish a tag (1234) for this package. Due to the previous package having 200+ tags, attempts to install this new package version/tag via the CLI will fail.

The command npm install @scope/test_package@1234 will fail stating the tag cannot be found. Using npm dist-tags ls will show only the first 200 tags that were published, and will not show any tags published thereafter. However, the 2.0.0 package and its 1234 tag will still be viewable in the UI.

This is occurring because the dist tags API endpoint is only returning those first 200 tags.

What does this MR do?

I'm introducing an alternative approach to retrieve the tags from the database. In this thread from a previous MR, there's a discussion on the problem with the current approach which results in not great performance.

The new approach is iterating on the packages in batches and for each batch, it fetches the children tags from the packages_tags table.

So now we query the packages_packages table in batches, and for each batch, we preload its tags. We iterate over the retrieved tags and construct the needed dist-tags hash with the package version & the tag name. Also, we make sure we handle the duplicate tags issue described here.

The new approach is behind a feature flag so we can gradually roll it out.

MR acceptance checklist

Please evaluate this MR against the MR acceptance checklist. It helps you analyze changes to reduce risks in quality, performance, reliability, security, and maintainability.

Screenshots or screen recordings

Screenshots are required for UI changes, and strongly recommended for all other merge requests.

Before After

How to set up and validate locally

Execute the following commands in the Rails console.

  1. Enable the feature flag :

    Feature.enable(: package_registry_npm_fetch_all_tags)
  2. Create a package with more than 200 tags:

    # stub file upload
    def fixture_file_upload(*args, **kwargs)
      Rack::Test::UploadedFile.new(*args, **kwargs)
    end
    
    group = FactoryBot.create(:group, name: 'test-group')
    project = FactoryBot.create(:project, namespace: group)
    
    package = FactoryBot.create(:npm_package, project: project, name: '@test-group/test-package')
    
    FactoryBot.create_list(:packages_tag, 220, package: package)
  3. Generate package metadata and check the count of the generated dist-tags in the response:

    Packages::Npm::GenerateMetadataService.new('@test-group/test-package', Packages::Package.where(name: '@test-group/test-package')).execute(only_dist_tags: true)

    The dist_tags hash in the service response should have 221 pairs of tags. The extra tag is the latest tag the service appends if it's not already there. You can disable the feature flag and execute the service again. The dist_tags hash will include only 201 tags.

Related to #436099 (closed)

Edited by Moaz Khalifa

Merge request reports