Add service to handle disallowing duplicate NuGet package uploads
What does this MR do and why?
In !123783 (merged), a setting for allowing/disallowing duplicate NuGet package uploads has been added. In this MR, I utilize the setting in Packages::Nuget::FindOrCreatePackageService to put that in action. Changes are behind the nuget_duplicates_option feature flag. If the feature flag is disabled for the project's namespace, then the behavior is the default one which is allowing duplicates.
How Nuget upload works:
A NuGet package is a compressed file with the extension .nupkg or .snupkg (for symbols). When this file is pushed to GitLab, it contains metadata stored within a .nuspec file that is embedded in the compressed package. To retrieve the package name and version, the .nupkg file needs to be unzipped, and the relevant data extracted from the .nuspec file.
This unzipping process is handled by a background worker to ensure speedy publishing. As a result, users would receive an acknowledgment that the package was created, even though it was still being processed and published. Any errors that occurred during the background process are visible on the package registry UI page, allowing users to identify and rectify them.
Now, we aim to introduce a feature that allows users to prevent the publishing of duplicate packages. This feature should operate synchronously, meaning the client (NuGet, dotnet, Visual Studio) should receive a 409 status code (Conflict) if an attempt is made to publish a duplicate version. To achieve this, we need to handle the file unzipping synchronously, rather than using the background worker, as we cannot determine the package name and version until they are extracted from the .nuspec file. These extracted values are then used to check for duplicate packages. The package is considered a duplicate if its name & version match the name & version of a published package in the same project.
To summarize:
- 
We need to extract the .nuspecfile from the package file synchronously in order to get the packagename&version. To achieve that efficiently, especially for large-size packages, we can handle theziparchive in a stream "mode"; meaning we don't download the whole.nupkgfile from the object store; alternatively we fire a streaming request and fetch the file in chunks. Each small fetched chunk can be unzipped and once the needed.nuspecfile is found, we extract it and stop streaming. The.nuspecfile is commonly located at the top level of the archive so it should be fetched within the first 2 chunks (tested with different-sized packages). If we reached 5 downloaded chunks without finding the.nuspecfile, we stop streaming and respond with an error:nuspec file not found.
- 
Step 1.is executed only when the user disallows duplicate package uploads. If the setting istrue(allowing duplicates), the entire publishing process is performed in the background worker, as before.
- 
This new setting does not affect symbol packages; they are handled as before. Symbols are attached to existing matching .nupkgpackages. If no matching package exists, the symbols are not published.
- 
The --skip-duplicateoption should work out of the box, as we now respond with a409status code (Conflict) in the case of duplication. The client (NuGet cli, dotnet cli) can then proceed with the next package in the push, if any, ignoring those that failed to be published due to duplication.
Implementation Details
- Add two new columns nuget_duplicates_allowed&nuget_duplicate_exception_regexto thenamespace_package_settingstable. The default is the current behavior which allows duplicates. (Done in !123783 (merged))
- Make them updatable by GraphQL, but not added yet to the UI; this should be done in a separate MR for the next milestone. (Done in !123783 (merged))
- Introduce a new service Packages::Nuget::FindOrCreatePackageServicewhich should check for duplication (if needed) then callExtractionWorkerto create the package and the package file.
- Introduce a new service Packages::Nuget::ExtractRemoteMetadataFileServicewhich is responsible for the zip streaming request of the package file.
- Ensure we don't unzip the package file twice if we already checked for duplication.
How to set up and validate locally
- 
Ensure you have the NuGet CLI installed (see nuget docs for links to installation pages). 
- 
Ensure the object store is enabled in your gdk. 
- 
In a new directory, run nuget spec. A file namedPackage.nuspecshould be created.
- 
Run nuget pack. A file namedPackage.nupkgshould be created.
- 
Add a GitLab project as your NuGet source: nuget source Add -Name localhost -Source "http://gdk.test:3000/api/v4/projects/<project_id>/packages/nuget/index.json" -UserName <gitlab_username> -Password <personal_access_token>
- 
Push the package to your project: nuget push Package.1.0.0.nupkg -Source localhost
- 
After the package is successfully published, clear the local NuGet cache nuget locals all -clear
- 
In the rails console, enable the nuget_duplicates_optionfeature flag for the namespace of the project:
Feature.enable(:nuget_duplicates_option, Namespace.find(<namespace_id>))- Update the namespace package settings nuget_duplicates_allowedusing the query below in graphql-explorer:
mutation {
  updateNamespacePackageSettings(input: {
    namespacePath: "<your-namespace-full-path>", 
    nugetDuplicatesAllowed:false,
  }) {
    packageSettings {
	nugetDuplicatesAllowed
    }
  }
}- Try to publish the same package again. You should see a 409 response from the server:
  Pushing Package.1.0.0.nupkg to 'http://gdk.test:3000/api/v4/projects/<project_id>/packages/nuget'...
  PUT http://gdk.test:3000/api/v4/projects/<project_id>/packages/nuget/
  Conflict http://gdk.test:3000/api/v4/projects/<project_id>/packages/nuget/ 6367ms
  To skip already published packages, use the option -SkipDuplicate
  Response status code does not indicate success: 409 (Conflict).- Update nuget_duplicates_allowedto betrueand try to publish the same package. It should be successfully published.
Test the exception regex:
- Update the package settings as below. The regex ".-be." would allow only duplicate packages whose name or version matches the regex.
  mutation {
    updateNamespacePackageSettings(input: {
      namespacePath: "<your-namespace-full-path>", 
      nugetDuplicatesAllowed:false,
      nugetDuplicateExceptionRegex: ".*-be.*"
    }) {
      packageSettings {
  	nugetDuplicatesAllowed
        nugetDuplicateExceptionRegex
      }
    }
  }- Edit the  field in file Package.nuspecfrom step 2. and make it2.0.0-betafor example then runnuget packand publish the generated.nupkgfile.
- Publish the same package again. It should be published successfully because version 2.0.0-betamatches the regex.*-be.*.
Screenshots or screen recordings
Screenshots are required for UI changes, and strongly recommended for all other merge requests.
| Before | After | 
|---|---|
How to set up and validate locally
Numbered steps to set up and validate the change are strongly suggested.
MR acceptance checklist
This checklist encourages us to confirm any changes have been analyzed to reduce risks in quality, performance, reliability, security, and maintainability.
- 
I have evaluated the MR acceptance checklist for this MR. 
💾  Database analysis
Related to #293748 (closed)