Skip to content

Add nuget dependencies support to the metadata extraction

David Fernandez requested to merge 42680-10io-nuget-dependencies-support into master

What does this MR do?

Add dependencies extraction to the GitLab NuGet repository (https://docs.gitlab.com/ee/user/packages/nuget_repository/)

See #42680 (closed)

Details

NuGet packages are uploaded as a .nupkg file (actually a zip archive file). This file is then processed by a worker (Packages::Nuget::ExtractionWorker) to get the package name and version from the .nuspec file. This MR will add dependencies to this extraction.

The .nuspec file is an XML file (documented here: https://docs.microsoft.com/en-us/nuget/reference/nuspec) that is standardized by an XSD schema: https://github.com/NuGet/NuGet.Client/blob/dev/src/NuGet.Core/NuGet.Packaging/compiler/resources/nuspec.xsd

Reading the XSD, we can see that the .nuspec file has a <dependencies> tag and its children have two forms. Here is an example:

    <dependencies>
      <dependency id="Moqi" version="2.5.6" include="Runtime,Compile" />
      <group targetFramework=".NETStandard2.0">
        <dependency id="Test.Dependency" version="2.3.7" exclude="Build,Analyzers" include="Runtime,Compile" />
        <dependency id="Newtonsoft.Json" version="12.0.3" exclude="Build,Analyzers" />
      </group>
      <dependency id="Castle.Core" />
    </dependencies>

In short, we either have a <dependency> element or a <group> with many <dependency> elements embedded. Note that we will need to extract not only the dependency name and version but also the include, exclude and targetFramework attributes.

Inspecting the XSD file, we notice that these attributes are all optional, even version is optional. In particular, this statement:

<dependencies>
  <dependency id="Moqi" />
</dependencies>

is valid.

Persisting dependencies in GitLab

We already have Packages::DependencyLink and Packages::Dependency that allow us to persist package dependencies. The only issue here is that these models will only handle the dependency name and version.

To be able to support the include, exclude and targetFramework attributes, a new model/table has been added and linked to Packages::DependencyLink: Packages::NugetDependencyLinkMetadatum. This new model follows what has been done at the Packages::Package level (see Packages::MavenMetadatum for an example).

Now we need to integrate this Packages::NugetDependencyLinkMetadatum during dependencies creation. This creation is handled by the Packages::CreateDependencyService. This service has some specificity: all the inserts are grouped and done through bulk inserts. This is to avoid n inserts, n being the number of the dependencies for the given package. Also, if a Packages::Dependency is re used accross packages, the Packages::Dependency row is re-used (for more details about this service, check !20549 (merged)).

We want to keep this same behavior (use bulk inserts) for Packages::NugetDependencyLinkMetadatum. To deal with that, we introduce a new nuget flavoured service: Packages::Nuget::CreateDependencyService. This service will use the shared service Packages::CreateDependencyService to insert Packages::DependencyLinks and Packages::Dependencys.

Once, we have the Packages::DependencyLinks, it's just a matter of looping through them, locate the nuget metadata (the include, exclude and targetFramework attributes) and build a rows structure. This structure is then used to bulk insert all the rows at once.

Follow ups

There are a few aspects that this MR doesn't tackle to keep it small:

  • Those dependencies are persisted by this MR but we need to do something with them 🙂 . We will update the nuget metadata endpoint to return them. Currently, it returns an empty array (see https://gitlab.com/gitlab-org/gitlab/-/blob/master/ee/lib/api/helpers/packages/nuget/metadata_presenter_helpers.rb#L48) and this leads to issues under specific conditions, see #211926 (closed)
  • The upload endpoint accepts multiple uploads for the same package name + version. The GitLab NuGet repository will simply append the newest files to the existing Packages::Package. This part has not been updated to support dependencies changes. In short, imagine that we push for package p with dependencies on a, b and c. Later on, we push the same package name + version but the .nuspec file references only dependencies a and c -> Dependency b has to be cleaned up. A follow up issue has been created: #216005.

Screenshots

On the nuget CLI:

nuget push Dependencies.App.4.2.3.nupkg -source local
WARNING: No API Key was provided and no API Key could be found for 'http://gitlab.local:8000/api/v4/projects/1/packages/nuget'. To save an API Key for a source use the 'setApiKey' command.
Pushing Dependencies.App.4.2.3.nupkg to 'http://gitlab.local:8000/api/v4/projects/1/packages/nuget'...
  PUT http://gitlab.local:8000/api/v4/projects/1/packages/nuget/
  Created http://gitlab.local:8000/api/v4/projects/1/packages/nuget/ 35479ms
Your package was pushed.

After the worker has gone through the package, we get this:

> Packages::Package.last.dependency_links
=> [#<Packages::DependencyLink:0x00007fb0ab3057b0 id: 112, package_id: 244, dependency_id: 44, dependency_type: "dependencies">,
 #<Packages::DependencyLink:0x00007fb0ab4db490 id: 113, package_id: 244, dependency_id: 45, dependency_type: "dependencies">,
 #<Packages::DependencyLink:0x00007fb0ab4db2b0 id: 114, package_id: 244, dependency_id: 46, dependency_type: "dependencies">]

> Packages::Package.last.dependency_links.first.nuget_metadatum
=> #<Packages::NugetDependencyLinkMetadatum:0x00007fb0aa4a50f8 dependency_link_id: 112, include: nil, exclude: "Build,Analyzers", target_framework: ".NETCoreApp3.1">

> Packages::Package.last.dependency_links.second.nuget_metadatum
=> #<Packages::NugetDependencyLinkMetadatum:0x00007fb0aa344ad8 dependency_link_id: 113, include: nil, exclude: "Build,Analyzers", target_framework: ".NETCoreApp3.1">

> Packages::Package.last.dependency_links.third.nuget_metadatum
=> #<Packages::NugetDependencyLinkMetadatum:0x00007fb0aa37f160 dependency_link_id: 114, include: nil, exclude: "Build,Analyzers", target_framework: ".NETCoreApp3.1">

Database review

Migration Up

$ rails db:migrate
== 20200424135319 CreateNugetDependencyLinkMetadata: migrating ================
-- table_exists?(:packages_nuget_dependency_link_metadata)
   -> 0.0005s
-- create_table(:packages_nuget_dependency_link_metadata, {:id=>false})
   -> 0.0228s
-- transaction_open?()
   -> 0.0000s
-- execute("ALTER TABLE packages_nuget_dependency_link_metadata\nADD CONSTRAINT packages_nuget_dependency_link_metadata_target_framework_constraint\nCHECK ( char_length(target_framework) <= 255 )\nNOT VALID;\n")
   -> 0.0008s
-- execute("SET statement_timeout TO 0")
   -> 0.0001s
-- execute("ALTER TABLE packages_nuget_dependency_link_metadata VALIDATE CONSTRAINT packages_nuget_dependency_link_metadata_target_framework_constraint;")
   -> 0.0004s
-- execute("RESET ALL")
   -> 0.0001s
== 20200424135319 CreateNugetDependencyLinkMetadata: migrated (0.0294s) =======

Migration Down

$ rails db:rollback
== 20200424135319 CreateNugetDependencyLinkMetadata: reverting ================
-- drop_table(:packages_nuget_dependency_link_metadata)
   -> 0.0092s
== 20200424135319 CreateNugetDependencyLinkMetadata: reverted (0.0093s) =======

Does this MR meet the acceptance criteria?

Conformity

Availability and Testing

Security

If this MR contains changes to processing or storing of credentials or tokens, authorization and authentication methods and other items described in the security review guidelines:

  • [-] Label as security and @ mention @gitlab-com/gl-security/appsec
  • [-] The MR includes necessary changes to maintain consistency between UI, API, email, or other methods
  • [-] Security reports checked/validated by a reviewer from the AppSec team
Edited by Mayra Cabrera

Merge request reports