Skip to content

Add support for NPM package metadata

David Fernandez requested to merge 330929-persist-npm-package-metadata into master

🍉 Context

The NPM package registry implements a bunch of API endpoints expected by $ npm (or $ yarn). Among those endpoints, we have what we call the metadata endpoint.

Basically, clients will check in with the Package Registry to answer this question: given a package name, what versions are available? The package registry will answer a json structure (similar to the package.json but for all versions) with the required information.

Within each version structure, clients expect to have a few fields. Here are some examples:

  • dist: describes where the *.tgz archive file for this version is located.
  • dependencies (and similar fields): describes dependencies of this version to other packages.

In #330929 (closed), it has been noted that the metadata endpoint didn't return all the necessary fields. Among those missing, there is the important bin field. It is used to insert executables in the current $PATH.

Solution

Fortunately, $ npm is kind enough to send the metadata structure along with the package file when a new version of a package is uploaded.

From there, we can save in the database that metadata with the given version.

Then, on the metadata endpoint, we can simply load the metadata object and read the fields.

The NPM package registry being one of the most used registries on gitlab.com, we can't simply copy the metadata structure in the metadata endpoint response and that's it. The metadata endpoint returns all the data for all the versions of a given package. There is no pagination options which means that if a package has 1K versions, the metadata will need to return all the data about those 1K versions.

On the other hand, the metadata of an NPM package has some fields clearly defined but users can put an arbitrary amount of custom fields in the metadata. An example of this, is the ng-update that angular packages have in their package.json.

In the spirit of iteration, the metadata endpoint will for now return the abbreviated form which means that only a strict set of fields are read by the metadata endpoint on each version and returned.

We already opened a follow up issue to support the full metadata form.

🤔 What does this MR do and why?

  • Add a packages_npm_metadata table with:
    • package_id the package id that this metadata belongs to.
    • package_json a jsonb field that will store the metadata structure.
  • Update app/services/packages/npm/create_package_service.rb so that the metadata is persisted in the new table
  • Update app/presenters/packages/npm/package_presenter.rb so that the metadata is loaded and read from the new table.
    • The read here is limited to the allowed fields (those from the abbreviated form)
    • This change is compatible with the existing packages (which will not have anything in the metadata table) = for these, this change will not impact how the metadata endpoint return them.
  • Update the related documentation

Because the [NPM package registry] is one of the most used registries on gitlab.com and the metadata endpoint is a central piece of logic behind the $ npm install logic, we will use a feature flag as a safety net.

Rollout issue: #344827 (closed)

🖼 Screenshots or screen recordings

Setup:

  • I used a package that defines metadata fields bin and engines.
  • I uploaded the first versions of an NPM package using master. Then, I uploaded a few versions with this MR branch and the feature flag enabled.
    • The first versions simulate existing packages with no metadata
    • The most recent versions simulate package uploads with the metadata support

With the feature flag disabled

Let's see the output of the metadata endpoint:

Screenshot_2021-11-03_at_15.41.43

  • We can see that all the versions metadata is similar (same fields).
  • For the most recent versions, the bin and engines fields are not returned.

With the feature flag enabled

Screenshot_2021-11-03_at_15.43.54

  • Most recent versions, the bin and engines fields are properly returned

How to set up and validate locally

Requirements:

  • A working local GitLab instance
  • A project (any visibility)
  • A personal access token with the api scope
  • npm
  1. Use $ npm init to initialize an npm package (any name)
  2. Update the package.json to include some extra fields.
    • Example
      Click to expand ```json { "name": "@many/npm_metadata", "version": "1.3.16", "description": "Package created by gl pru", "main": "index.js", "scripts": { "preinstall": "echo \"PREINSTALL script!\"", "install": "echo \"INSTALL script!\"", "postinstall": "echo \"POSTINSTALL script!\"" }, "keywords": [], "author": "GitLab Package Registry Utility", "license": "ISC", "publishConfig": { "registry":"http://gdk.test:8000/api/v4/projects/166/packages/npm/" }, "engines": { "node": "^12.14.1 || >=14.0.0", "npm": "^6.11.0 || ^7.5.6 || >=8.0.0", "yarn": ">= 1.13.0" } } ```
  3. Setup the credentials for the npm package registry following https://docs.gitlab.com/ee/user/packages/npm_registry/#project-level-npm-endpoint
  4. Push a few versions with $ npm publish
    • To bump the version, just update the version field in the package.json file
  5. Enabled the feature flag: Feature.enable(:packages_npm_abbreviated_metadata)
  6. Push a few more versions

Now check the metadata endpoint and its output depending on the feature flag state.

  1. Go to <gitlab_base_url>/api/v4/projects/<project_id>/packages/npm/<package_name_including_scope>
  2. Enable/disable the feature flag and see the effects of it in the output of the metadata endpoint

MR acceptance checklist

This checklist encourages us to confirm any changes have been analyzed to reduce risks in quality, performance, reliability, security, and maintainability.

💾 Database review

Enabling the feature flag will add a preload of all metadata. This will basically load a set of rows from packages_npm_metadata given a set of package ids.

Migration up

$ rails db:migrate
== 20211028132247 CreatePackagesNpmMetadata: migrating ========================
-- transaction_open?()
   -> 0.0000s
-- create_table(:packages_npm_metadata, {:id=>false})
   -> 0.0273s
== 20211028132247 CreatePackagesNpmMetadata: migrated (0.0503s) ===============

Migration down

$ rails db:rollback
== 20211028132247 CreatePackagesNpmMetadata: reverting ========================
-- transaction_open?()
   -> 0.0000s
-- drop_table(:packages_npm_metadata)
   -> 0.0103s
== 20211028132247 CreatePackagesNpmMetadata: reverted (0.0441s) ===============
Edited by David Fernandez

Merge request reports