Skip to content

Add service to handle disallowing duplicate NuGet package uploads

What does this MR do and why?

In !123783 (merged), a setting for allowing/disallowing duplicate NuGet package uploads has been added. In this MR, I utilize the setting in Packages::Nuget::FindOrCreatePackageService to put that in action. Changes are behind the nuget_duplicates_option feature flag. If the feature flag is disabled for the project's namespace, then the behavior is the default one which is allowing duplicates.

How Nuget upload works:

A NuGet package is a compressed file with the extension .nupkg or .snupkg (for symbols). When this file is pushed to GitLab, it contains metadata stored within a .nuspec file that is embedded in the compressed package. To retrieve the package name and version, the .nupkg file needs to be unzipped, and the relevant data extracted from the .nuspec file.

This unzipping process is handled by a background worker to ensure speedy publishing. As a result, users would receive an acknowledgment that the package was created, even though it was still being processed and published. Any errors that occurred during the background process are visible on the package registry UI page, allowing users to identify and rectify them.

Now, we aim to introduce a feature that allows users to prevent the publishing of duplicate packages. This feature should operate synchronously, meaning the client (NuGet, dotnet, Visual Studio) should receive a 409 status code (Conflict) if an attempt is made to publish a duplicate version. To achieve this, we need to handle the file unzipping synchronously, rather than using the background worker, as we cannot determine the package name and version until they are extracted from the .nuspec file. These extracted values are then used to check for duplicate packages. The package is considered a duplicate if its name & version match the name & version of a published package in the same project.

To summarize:

  1. We need to extract the .nuspec file from the package file synchronously in order to get the package name & version. To achieve that efficiently, especially for large-size packages, we can handle the zip archive in a stream "mode"; meaning we don't download the whole .nupkg file from the object store; alternatively we fire a streaming request and fetch the file in chunks. Each small fetched chunk can be unzipped and once the needed .nuspec file is found, we extract it and stop streaming. The .nuspec file is commonly located at the top level of the archive so it should be fetched within the first 2 chunks (tested with different-sized packages). If we reached 5 downloaded chunks without finding the .nuspec file, we stop streaming and respond with an error: nuspec file not found.

  2. Step 1. is executed only when the user disallows duplicate package uploads. If the setting is true (allowing duplicates), the entire publishing process is performed in the background worker, as before.

  3. This new setting does not affect symbol packages; they are handled as before. Symbols are attached to existing matching .nupkg packages. If no matching package exists, the symbols are not published.

  4. The --skip-duplicate option should work out of the box, as we now respond with a 409 status code (Conflict) in the case of duplication. The client (NuGet cli, dotnet cli) can then proceed with the next package in the push, if any, ignoring those that failed to be published due to duplication.

Implementation Details

  1. Add two new columns nuget_duplicates_allowed & nuget_duplicate_exception_regex to the namespace_package_settings table. The default is the current behavior which allows duplicates. (Done in !123783 (merged))
  2. Make them updatable by GraphQL, but not added yet to the UI; this should be done in a separate MR for the next milestone. (Done in !123783 (merged))
  3. Introduce a new service Packages::Nuget::FindOrCreatePackageService which should check for duplication (if needed) then call ExtractionWorker to create the package and the package file.
  4. Introduce a new service Packages::Nuget::ExtractRemoteMetadataFileService which is responsible for the zip streaming request of the package file.
  5. Ensure we don't unzip the package file twice if we already checked for duplication.

How to set up and validate locally

  1. Ensure you have the NuGet CLI installed (see nuget docs for links to installation pages).

  2. Ensure the object store is enabled in your gdk.

  3. In a new directory, run nuget spec. A file named Package.nuspec should be created.

  4. Run nuget pack. A file named Package.nupkg should be created.

  5. Add a GitLab project as your NuGet source:

      nuget source Add -Name localhost -Source "http://gdk.test:3000/api/v4/projects/<project_id>/packages/nuget/index.json" -UserName <gitlab_username> -Password <personal_access_token>
  6. Push the package to your project:

      nuget push Package.1.0.0.nupkg -Source localhost
  7. After the package is successfully published, clear the local NuGet cache

      nuget locals all -clear
  8. In the rails console, enable the nuget_duplicates_option feature flag for the namespace of the project:

Feature.enable(:nuget_duplicates_option, Namespace.find(<namespace_id>))
  1. Update the namespace package settings nuget_duplicates_allowed using the query below in graphql-explorer:
mutation {
  updateNamespacePackageSettings(input: {
    namespacePath: "<your-namespace-full-path>", 
    nugetDuplicatesAllowed:false,
  }) {
    packageSettings {
	nugetDuplicatesAllowed
    }
  }
}
  1. Try to publish the same package again. You should see a 409 response from the server:
  Pushing Package.1.0.0.nupkg to 'http://gdk.test:3000/api/v4/projects/<project_id>/packages/nuget'...
  PUT http://gdk.test:3000/api/v4/projects/<project_id>/packages/nuget/
  Conflict http://gdk.test:3000/api/v4/projects/<project_id>/packages/nuget/ 6367ms
  To skip already published packages, use the option -SkipDuplicate
  Response status code does not indicate success: 409 (Conflict).
  1. Update nuget_duplicates_allowed to be true and try to publish the same package. It should be successfully published.

Test the exception regex:

  1. Update the package settings as below. The regex ".-be." would allow only duplicate packages whose name or version matches the regex.
  mutation {
    updateNamespacePackageSettings(input: {
      namespacePath: "<your-namespace-full-path>", 
      nugetDuplicatesAllowed:false,
      nugetDuplicateExceptionRegex: ".*-be.*"
    }) {
      packageSettings {
  	nugetDuplicatesAllowed
        nugetDuplicateExceptionRegex
      }
    }
  }
  1. Edit the field in file Package.nuspec from step 2. and make it 2.0.0-beta for example then run nuget pack and publish the generated .nupkg file.
  2. Publish the same package again. It should be published successfully because version 2.0.0-beta matches the regex .*-be.*.

Screenshots or screen recordings

Screenshots are required for UI changes, and strongly recommended for all other merge requests.

Before After

How to set up and validate locally

Numbered steps to set up and validate the change are strongly suggested.

MR acceptance checklist

This checklist encourages us to confirm any changes have been analyzed to reduce risks in quality, performance, reliability, security, and maintainability.

💾 Database analysis

Related to #293748 (closed)

Edited by David Fernandez

Merge request reports