Skip to content

Can't publish large NuGet packages

Summary

Nuget packages larger than 500MB silently fail to be uploaded to the GitLab.com package registry.

🦀 Context

Nuget packages are uploaded as simple zip files called package.nupkg. The GitLab package registry needs to have at least the package name and version to make it available for pulling.

To achieve that, once a nuget package is uploaded, we enqueue a background job (called Packages::Nuget::ExtractionWorker) that will pull the package file, open the zip archive and read the *.nuspec file to extract among other things, the package name and version.

The background job will then either:

  • find an existing package with that name and version and append the package file
  • update the package with the final name and version

Both of these operations will make the package file be moved in object storage. This is due to how the path of a package file is computed: we use the package id, the package file id and the filename of the archive. Updating one of those will require a move within object storage.

Proposal

  • Download the file for the metadata extraction. That's the current #use_open_file usage. We will still need this.
  • Update the metadata but don't download the file twice. Instead copy it over its new key.
  • Destroy the old key.

Further details

📊 MR plan

Those changes are quite deep and change how the background job process nuget packages. Given the depth of this change, we will need to use a feature flag as an additional safety net.

The changes can be in a single MR.

🔭 Things to consider

During the MR implementation, we will need to test different conditions to make sure that this change works as expected:

Object storage Existing nuget package
disabled yes
disabled no
GCP yes
GCP no
AWS yes
AWS no

In particular, we saw that if the nuget package already exists, the DELETE request fails and the original file is left behind. We will need to fix that.

Lastly, this need of moving a file within object storage is a common need in package background processing. To make it re-usable, it's advised to create a service specifically for that.

Relevant logs and/or screenshots

https://gitlab.com/immersaview/public/packages/chromium-embedded-framework/-/jobs/1025547697

  • Snippet of log
pushd NuGet
/builds/immersaview/public/packages/chromium-embedded-framework/NuGet /builds/immersaview/public/packages/chromium-embedded-framework
$ dotnet nuget add source "$CI_SERVER_URL/api/v4/projects/$CI_PROJECT_ID/packages/nuget/index.json" --name gitlab --username gitlab-ci-token --password $CI_JOB_TOKEN --store-password-in-clear-text
Package source with Name: gitlab added successfully.
$ dotnet nuget push *.nupkg --source gitlab
warn : No API Key was provided and no API Key could be found for 'https://gitlab.com/api/v4/projects/24280473/packages/nuget'. To save an API Key for a source use the 'setApiKey' command.
Pushing Imv.External.chromium-embedded-framework.3.3359.1774.20191217.nupkg to 'https://gitlab.com/api/v4/projects/24280473/packages/nuget'...
  PUT https://gitlab.com/api/v4/projects/24280473/packages/nuget/
  Created https://gitlab.com/api/v4/projects/24280473/packages/nuget/ 26834ms
Your package was pushed.
$ popd
  • Package Registry is empty image

Output of checks

This bug happens on GitLab.com

Edited by Tim Rizzi