Offline package registries

Problem to solve

As suggested by @xlgmokha (who gets all the credits), in air-gapped environments the registries where the license meta-data lie could not be available.

Intended users

Further details

They are in general two ways to determine a package license (unfortunately, package managers don't store this information in lock files):

  • Browsing the files of the installed dependency, looking for a specification file (ex: *.gemspec, or a license file directly (ex: LICENSE.md).
  • Requesting the information to a registry, through its API. This is the case for Python packages, which don't include license information.

While the first option is pretty straightforward, it also implies to install the dependencies first. The second requires a registry to be available. In the case where the packages are just served by a package server (proxy), this metadata API isn't available at all.

Proposal

Some registries provide a way to fetch (full story, or incremental) packages information. We can get these backups, or request these feeds with scheduled pipelines to fetch and store the list of packages, their versions, and the corresponding license. This triplet ([package name, version, license] would be stored in a distributable way (flat files, sqlite db, etc.).

Permissions and Security

N/A

Documentation

TBD

Availability & Testing

TBD

What does success look like, and how can we measure that?

  • Users in complete air-gapped environments can determine what license their dependencies are using.

What is the type of buyer?

GitLab Ultimate

Is this a cross-stage feature?

TBD

Links / references

&1359 (closed)