Skip to content

Investigate limits for the Package Registry

backend-weight2

Topic to Evaluate

The GitLab Package Registry allows you to publish and install packages in a variety of formats to your GitLab instance. This issue proposes investigating the possibility and scope of adding limits to the GitLab Package Registry.

This investigation is part of the broader epic &1422 (closed), which aims to add limits for the Package stage.

Questions

What limits are currently in place for the GitLab Package Registry. (This includes all package manager formats)

File Size

Package Type Max Size
NPM None
Maven None
Conan 50 MB
NuGet 50 MB
PyPI 50 MB

For Conan/NuGet/PyPI, these are controlled here and can be modified per package type in the authorize_workhorse! call (hardcoded).

Maven could easily be modified by adjusting it to use authorize_workhorse!, or by passing it in as a param in it's file upload endpoint (hardcoded).

NPM is the only package manager not using workhorse uploads (it uses carrierwave and accepts the file directly), there is no current limit.

Current File Name and Type exclusion

Package Type File exclusions
NPM None. I believe we can restrict to only .tgz files. File names are validated with the standard package name regex.
Maven None. We should be able to restrict to .xml, .pom, .jar files. File names are validated with a Maven specific regex.
Conan Yes. Only specific file names and types are allowed defined in ConanFileMetadatum. This has proven to be an issue because some files that should be allowed were not known during implementation, preventing some use cases. There is a list of files in the conan source code here, but it is unclear which are the allowed upload files and which are other files, so some investigation will be needed to update.
NuGet Yes. Only .nupkg files are allowed. File names are validated with the standard package name regex
PyPI None. We should be able to restrict to tar.gz and .whl files. File names are validated with the standard package name regex

Current registry storage limits

There is currently no limit (that I can tell) to the amount of storage that can be used for files in the package registry at the project, group, or instance levels.- Are there sensible, non-controversial limits that can be easily added

Are there specific file types that we should exclude?

Yes, for each package manager, we should only allow the specific filetypes that package manager uses.

Can we consider adding storage limits in the future? Can they be enforced?

We can add limits likely at the project, group, or instance level. We can limit the total size, or total number of packages.

Total number

This would be the easiest route, we could add settings at any of the above levels that could be configured by admin users. Enforcing would be as easy as checking how many packages exist within a given project/group/instance when someone tries to push a new package and rejecting if the limit has been met. Each level would be it's own issue here, each likely having a weight of 3, needing both frontend and backend work.

Total size

Similar to number, we could add settings at the above levels for admin users to configure. Enforcing this is a bit more difficult. I'm not sure how we could quickly go about calculating the amount of space taken up by packages. I'm thinking perhaps it could be saved in the database (in a project or group supplemental statistics table) and updated anytime someone pushes or deletes a package. Doable, but likely a larger issue. Each level would be it's own issue here, they would all be somewhat large, >5, and need to be broken down into a few steps to be implemented.

How could we implement the above?

See the proposed issues and the storage limits comments.

Proposed Issues

I've added a rough weight estimate to each

  • 1 Restrict NPM packages to .tgz file type
  • 1 Restrict Maven packages to .xml, .pom. and .jar files.
  • 1 Restrict PyPI packages to .tar.gz and .whl files.
  • 2 Consider tighter naming restrictions for Maven, NPM, NuGet, and PyPI based on any restriction within their own package manager rules
  • 1 Limit Maven package files to 50MB
  • 2 Limit NPM package files to 50MB

Tasks

  • Create issue for implementation or update existing implementation issue description with implementation proposal
  • Set weight on implementation issue
  • If weight is greater than 5, break issue into smaller issues

Risks and Implementation Considerations

Setting a limit, specifically on size is a large undertaking. With the proposed idea of storing a constantly updating value in the database, the work might look like:

For Project level:

  1. 1 Database migration to add storage columns to a project_statistics table
  2. 2-3 Background migration to determine storage for existing projects and update the table.
  3. 1-2(x5) Update package manager code to update the stats every time a package is uploaded/deleted.
  4. 1 Database migration to add limit settings to project table.
  5. 1 Frontend migration for limit settings
  6. 1 Frontend migration for displaying current storage usage or amount remaining
  7. 2 Add logic to reject package uploads when a package is uploaded after the limit is met.

The breakdown would be similar for group or instance level.

Resulting Issues:

Size limits

Note I've set these up to have Maven implemented first, adding the database migration, so it has a weight of 2 and the others all have a weight of 1. Any can be worked on first and have the weight and database MR swapped with Maven.

File type restriction

File name restriction

Storage limits

Note the storage limits issue is large and should be broken down to fit pieces within a given milestone.

Edited by Steve Abrams