Skip to content

Add a setting for allowing/disallowing duplicate NuGet package upload

What does this MR do and why?

Context

When using the GitLab Package Registry to publish NuGet packages, a duplicate package name/version can be uploaded. This may be great for snapshots, but you may want your releases to be immutable.

This MR introduces a new setting that enables the user to define, at the group level, whether duplicate NuGet packages are allowed or not.

How Nuget upload works:

A NuGet package is a compressed file with the extension .nupkg or .snupkg (for symbols). When this file is pushed to GitLab, it contains metadata stored within a .nuspec file that is embedded in the compressed package. To retrieve the package name and version, the .nupkg file needs to be unzipped, and the relevant data extracted from the .nuspec file.

This unzipping process is handled by a background worker to ensure speedy publishing. As a result, users would receive an acknowledgment that the package was created, even though it was still being processed and published. Any errors that occurred during the background process are visible on the package registry UI page, allowing users to identify and rectify them.

Now, we aim to introduce a feature that allows users to prevent the publishing of duplicate packages. This feature should operate synchronously, meaning the client (NuGet, dotnet, Visual Studio) should receive a 409 status code (Conflict) if an attempt is made to publish a duplicate version. To achieve this, we need to handle the file unzipping synchronously, rather than using the background worker, as we cannot determine the package name and version until they are extracted from the .nuspec file. These extracted values are then used to check for duplicate packages. The package is considered a duplicate if its name & version match the name & version of a published package in the same project.

To summarize:

  1. We need to extract the .nuspec file from the package file synchronously in order to get the package name & version. To achieve that efficiently, especially for large-size packages, we can handle the zip archive in a stream "mode"; meaning we don't download the whole .nupkg file from the object store; alternatively we fire a streaming request and fetch the file in chunks. Each small fetched chunk can be unzipped and once the needed .nuspec file is found, we extract it and stop streaming. The .nuspec file is located at the top level of the archive so it should be fetched within the first two chunks (tested with different-sized packages). If we reached five downloaded chunks without finding the .nuspec file, we stop streaming and respond with an error: nuspec file not found.
  2. Step 1. is executed only when the user disallows duplicate package uploads. If the setting is true (allowing duplicates), the entire publishing process is performed in the background worker, as before.
  3. If duplicate package upload is disallowed and the .nuspec file is extracted as described in step 1., we no longer repeat it during the subsequent publishing steps (provided the duplicate packages setting is disallowed).
  4. This new setting does not affect symbol packages; they are handled as before. Symbols are attached to existing matching .nupkg packages. If no matching package exists, the symbols are not published.
  5. The --skip-duplicate option should work out of the box, as we now respond with a 409 status code (Conflict) in the case of duplication. The client (NuGet cli, dotnet cli) can then proceed with the next package in the push, if any, ignoring those that failed to be published due to duplication.

Implementation Details

  1. Add two new columns nuget_duplicates_allowed & nuget_duplicate_exception_regex to the namespace_package_settings table. The default is the current behavior which allows duplicates.
  2. Make them updatable by GraphQL, but not added yet to the UI; this should be done in a separate MR for the next milestone.
  3. Introduce a new service Packages::Nuget::FindOrCreatePackageService which should check for duplication (if needed) then call ExtractionWorker to create the package and the package file.
  4. Introduce a new service Packages::Nuget::ExtractRemoteMetadataFileService which is responsible for the zip streaming request of the package file.
  5. Ensure we don't unzip the package file twice if we already checked for duplication.

How to set up and validate locally

  1. Ensure you have the NuGet CLI installed (see nuget docs for links to installation pages).

  2. Ensure the object store is enabled in your gdk.

  3. In a new directory, run nuget spec. A file named Package.nuspec should be created.

  4. Run nuget pack. A file named Package.nupkg should be created.

  5. Add a GitLab project as your NuGet source:

      nuget source Add -Name localhost -Source "http://gdk.test:3000/api/v4/projects/<project_id>/packages/nuget/index.json" -UserName <gitlab_username> -Password <personal_access_token>
  6. Push the package to your project:

      nuget push Package.1.0.0.nupkg -Source localhost
  7. After the package is successfully published, clear the local NuGet cache

      nuget locals all -clear
  8. Update the namespace package settings nuget_duplicates_allowed using the query below in graphql-explorer:

mutation {
  updateNamespacePackageSettings(input: {
    namespacePath: "<your-namespace-full-path>", 
    nugetDuplicatesAllowed:false,
  }) {
    packageSettings {
	nugetDuplicatesAllowed
    }
  }
}
  1. Try to publish the same package again. You should see a 409 response from the server:
  Pushing Package.1.0.0.nupkg to 'http://gdk.test:3000/api/v4/projects/<project_id>/packages/nuget'...
  PUT http://gdk.test:3000/api/v4/projects/<project_id>/packages/nuget/
  Conflict http://gdk.test:3000/api/v4/projects/<project_id>/packages/nuget/ 6367ms
  To skip already published packages, use the option -SkipDuplicate
  Response status code does not indicate success: 409 (Conflict).
  1. Update nuget_duplicates_allowed to be true and try to publish the same package. It should be successfully published.

Test the exception regex:

  1. Update the package settings as below. The regex ".-be." would allow only duplicate packages whose name or version matches the regex.
  mutation {
    updateNamespacePackageSettings(input: {
      namespacePath: "<your-namespace-full-path>", 
      nugetDuplicatesAllowed:false,
      nugetDuplicateExceptionRegex: ".*-be.*"
    }) {
      packageSettings {
  	nugetDuplicatesAllowed
        nugetDuplicateExceptionRegex
      }
    }
  }
  1. Edit the field in file Package.nuspec from step 2. and make it 2.0.0-beta for example then run nuget pack and publish the generated .nupkg file.
  2. Publish the same package again. It should be published successfully because version 2.0.0-beta matches the regex .*-be.*.

MR acceptance checklist

This checklist encourages us to confirm any changes have been analyzed to reduce risks in quality, performance, reliability, security, and maintainability.

Related to #293748 (closed)

Edited by Moaz Khalifa

Merge request reports