Add metadata cache validity hours to Maven upstreams

🔥 Problem

In Maven's world, maven-metadata.xml files represent metadata (obvious right) about the registry state. Its primary function is to list the available versions of a given package.

In Maven virtual registries, the cache system has a validity period that defines how much time a cache entry is considered as valid before being checked with the upstream for any updates.

The problem is that this period value can be set to 0, meaning the the virtual registry will never check with upstream for updates. Thus, the cached "version" of the file will be kept around for ever. This could be problematic for metadata files. We can imagine that when a new version is published to the upstream, the metadata file is updated and so if we use the old version of that file, clients will not "see" the newest version.

This situations arises under specific conditions:

  • a snapshot dependency is used.

  • a non snapshot dependency + a range selector is used:

     <!-- Accepts 1.2.3 or any newer version -->
          <dependency>
              <groupId>junit</groupId>
              <artifactId>junit</artifactId>
              <version>[1.2.3,)</version>
          </dependency>
  • a cache validity period of 0 is used.

    • This is the value that we default to when the upstream being created points to Maven central and the user doesn't pass any specific value.
  • a new version of the dependency is published, and we want to pull it through the virtual registries.

🚒 Solution

  • Have two validity periods on the upstream: one for the actual files, one for everything else including API calls (not used for Maven) and metadata responses/files.
    • Introduce a new column (default value could be 1 or 24). That column should not be allowed to be set to 0.
  • Update the Maven handle file request service so that:
    1. We detect when we have a metadata call. The requested file is maven-metadata.xml.
    2. Correctly select which cache validity period we should use.
    3. Apply the existing logic with the selected cache validity
  • Update the related documentation.

Opportunities:

  • We could embed the entire change in the model #stale? function. This way, the service would not need any change.

What does this MR do and why?

  • Introduce metadata_cache_validity_hours column to virtual_registries_packages_maven_upstreams table with a default value of 24.
  • Update model validations to ensure metadata_cache_validity_hours is greater than 0.
  • Modify relevant API endpoints and documentation to include metadata_cache_validity_hours.
  • Adjust cache entry logic to utilize metadata_cache_validity_hours for metadata files.
  • Enhance tests to cover new functionality and ensure proper validation.

References

Screenshots or screen recordings

N/A

How to set up and validate locally

  • Have a GitLab instance with an EE licence, as the maven virtual registry is an EE only feature.
  • Have a top level group id ready (maintainer access level).
  • Have a PAT ready (scope api).

First, let's enable the feature flag: Feature.enable(:maven_virtual_registry).

Second, let's create a maven virtual registry and point it out to a public GitLab's Maven Repository. We can use $ curl for that.

# create the registry object and note the id
$ curl -X POST -H "PRIVATE-TOKEN: <PAT>" "http://gdk.test:3000/api/v4/groups/<top level group id>/-/virtual_registries/packages/maven/registries?name=testing_metadata_cache"

# create the upstream and note the id
$ curl -H "PRIVATE-TOKEN: <PAT>" --data-urlencode 'url=https://gitlab.com/api/v4/projects/60412753/packages/maven' --data-urlencode 'name=upstream' --data-urlencode 'cache_validity_hours=0' -X POST http://gdk.test:8000/api/v4/virtual_registries/packages/maven/registries/<registry id>/upstreams

The public GitLab's Maven Repository we use as an upstream has a published package with v1.1.0. We are going to consume it as a dependency in a dummy maven app.

  • In a new local directory named mvn_consumer, create the following files (don't forget to fill the placeholders):

    pom.xml
    <?xml version="1.0" encoding="UTF-8"?>
    <project xmlns="http://maven.apache.org/POM/4.0.0">
      <modelVersion>4.0.0</modelVersion>
      
      <groupId>com.consumer</groupId>
      <artifactId>consumer-app</artifactId>
      <version>1.0.0</version>
      
      <repositories>
          <repository>
              <id>gitlab-virtual</id>
              <url>http://gdk.test:3000/api/v4/virtual_registries/packages/maven/<REGISTRY_ID></url>
          </repository>
      </repositories>
      
      <dependencies>
          <dependency>
              <groupId>com.example</groupId>
              <artifactId>test-package</artifactId>
              <version>[1.1.0,)</version> <!-- Range selector to pick latest -->
          </dependency>
      </dependencies>
    </project>
    settings.xml
    <settings>
    <servers>
      <server>
        <id>http-unblocker</id>
        <configuration>
          <httpHeaders>
            <property>
              <name>Private-Token</name>
              <value><PAT></value>
            </property>
          </httpHeaders>
        </configuration>
      </server>
    </servers>
    
    <mirrors>
      <mirror>
        <id>http-unblocker</id>
        <mirrorOf>gitlab-virtual</mirrorOf>
        <name></name>
        <url>http://gdk.test:3000/api/v4/virtual_registries/packages/maven/<REGISTRY_ID></url>
        <blocked>false</blocked>
      </mirror>
     </mirrors>
    </settings>
  • In the root of the directory mvn_consumer, run mvn clean compile -s ./settings.xml. This should pull v1.1.0 of the artifact com/example/test-package from the GitLab's Maven Repository upstream and cache it in the GDK.

  • Verify in rails console that the cache entries have been successfully created:

    VirtualRegistries::Packages::Maven::Upstream.last.cache_entries
  • Now we are going to publish a new version of the artifact com/example/test-package to GitLab's Maven Repository.

    • In a new local directory named mvn_publisher, create the following files (don't forget to fill the placeholders):

      pom.xml
      <?xml version="1.0" encoding="UTF-8"?>
      <project xmlns="http://maven.apache.org/POM/4.0.0">
      <modelVersion>4.0.0</modelVersion>
      
      <groupId>com.example</groupId>
      <artifactId>test-package</artifactId>
      <version>1.2.0</version>
      <packaging>jar</packaging>
      
      <distributionManagement>
          <repository>
              <id>gitlab-maven</id>
              <url>https://gitlab.com/api/v4/projects/60412753/packages/maven</url>
          </repository>
      </distributionManagement>
      </project>
      settings.xml
      <settings>
      <servers>
          <server>
              <id>gitlab-maven</id>
              <configuration>
                  <httpHeaders>
                      <property>
                          <name>Private-Token</name>
                          <value><YOUR GITLAB (NOT GDK) PAT></value>
                      </property>
                  </httpHeaders>
              </configuration>
          </server>
      </servers>
      </settings>
  • In the root of the directory, run mvn clean deploy -s ./settings.xml. This will publish a new version v1.2.0 of the package com/example/test-package to the GitLab's repository.

Reproducing the bug on master

  • Delete the compiled local package by running rm -rf ~/.m2/repository/com/example/test-package. This is to make sure we always pull a fresh version of the artifact.
  • In the root of the directory mvn_consumer, re-run mvn clean compile -s ./settings.xml. The expectation is that the virtual registry should fetch the latest uploaded version v1.2.0 of the package com/example/test-package, but because we don't hit the upstream (we sat cache_validity_hours with 0), we don't get a fresh updated version of the maven-metadata.xml file, and the fetched version will be v1.1.0 😞

Validating the fix on this branch

  • Make sure you ran the migration to create the new column metadata_cache_validity_hours
  • in rails console, we need to update the metadata file's upstream_checked_at so that we can hit the upstream:
    VirtualRegistries::Packages::Maven::Upstream.last.default_cache_entries.find_by_relative_path("/com/example/test-package/maven-metadata.xml").update!(upstream_checked_at: 48.hours.ago)
  • Delete the compiled local package by running rm -rf ~/.m2/repository/com/example/test-package.
  • In the root of the directory mvn_consumer, re-run mvn clean compile -s ./settings.xml. This time, the latest version v1.2.0 of the package com/example/test-package should be installed successfully 🎉

MR acceptance checklist

Evaluate this MR against the MR acceptance checklist. It helps you analyze changes to reduce risks in quality, performance, reliability, security, and maintainability.

Related to #556138 (closed)

Edited by Moaz Khalifa

Merge request reports

Loading