Add metadata cache validity hours to Maven upstreams
🔥 Problem
In Maven's world, maven-metadata.xml files represent metadata (obvious right) about the registry state. Its primary function is to list the available versions of a given package.
In Maven virtual registries, the cache system has a validity period that defines how much time a cache entry is considered as valid before being checked with the upstream for any updates.
The problem is that this period value can be set to 0, meaning the the virtual registry will never check with upstream for updates. Thus, the cached "version" of the file will be kept around for ever. This could be problematic for metadata files. We can imagine that when a new version is published to the upstream, the metadata file is updated and so if we use the old version of that file, clients will not "see" the newest version.
This situations arises under specific conditions:
-
a snapshot dependency is used.
-
a non snapshot dependency + a range selector is used:
<!-- Accepts 1.2.3 or any newer version --> <dependency> <groupId>junit</groupId> <artifactId>junit</artifactId> <version>[1.2.3,)</version> </dependency> -
a cache validity period of
0is used.- This is the value that we default to when the upstream being created points to Maven central and the user doesn't pass any specific value.
-
a new version of the dependency is published, and we want to pull it through the virtual registries.
🚒 Solution
- Have two validity periods on the upstream: one for the actual files, one for everything else including API calls (not used for Maven) and metadata responses/files.
- Introduce a new column (default value could be
1or24). That column should not be allowed to be set to0.
- Introduce a new column (default value could be
- Update the Maven handle file request service so that:
- We detect when we have a metadata call. The requested file is
maven-metadata.xml. - Correctly select which cache validity period we should use.
- Apply the existing logic with the selected cache validity
- We detect when we have a metadata call. The requested file is
- Update the related documentation.
Opportunities:
- We could embed the entire change in the model
#stale?function. This way, the service would not need any change.
What does this MR do and why?
- Introduce
metadata_cache_validity_hourscolumn tovirtual_registries_packages_maven_upstreamstable with a default value of 24. - Update model validations to ensure
metadata_cache_validity_hoursis greater than 0. - Modify relevant API endpoints and documentation to include
metadata_cache_validity_hours. - Adjust cache entry logic to utilize
metadata_cache_validity_hoursfor metadata files. - Enhance tests to cover new functionality and ensure proper validation.
References
- Maven virtual registry: cache validity and mave... (#556138 - closed) • Moaz Khalifa • 18.3
- https://docs.gitlab.com/user/packages/virtual_registry
Screenshots or screen recordings
N/A
How to set up and validate locally
- Have a GitLab instance with an EE licence, as the maven virtual registry is an EE only feature.
- Have a top level group id ready (
maintaineraccess level). - Have a PAT ready (scope
api).
First, let's enable the feature flag: Feature.enable(:maven_virtual_registry).
Second, let's create a maven virtual registry and point it out to a public GitLab's Maven Repository. We can use $ curl for that.
# create the registry object and note the id
$ curl -X POST -H "PRIVATE-TOKEN: <PAT>" "http://gdk.test:3000/api/v4/groups/<top level group id>/-/virtual_registries/packages/maven/registries?name=testing_metadata_cache"
# create the upstream and note the id
$ curl -H "PRIVATE-TOKEN: <PAT>" --data-urlencode 'url=https://gitlab.com/api/v4/projects/60412753/packages/maven' --data-urlencode 'name=upstream' --data-urlencode 'cache_validity_hours=0' -X POST http://gdk.test:8000/api/v4/virtual_registries/packages/maven/registries/<registry id>/upstreams
The public GitLab's Maven Repository we use as an upstream has a published package with v1.1.0. We are going to consume it as a dependency in a dummy maven app.
-
In a new local directory named
mvn_consumer, create the following files (don't forget to fill the placeholders):pom.xml
<?xml version="1.0" encoding="UTF-8"?> <project xmlns="http://maven.apache.org/POM/4.0.0"> <modelVersion>4.0.0</modelVersion> <groupId>com.consumer</groupId> <artifactId>consumer-app</artifactId> <version>1.0.0</version> <repositories> <repository> <id>gitlab-virtual</id> <url>http://gdk.test:3000/api/v4/virtual_registries/packages/maven/<REGISTRY_ID></url> </repository> </repositories> <dependencies> <dependency> <groupId>com.example</groupId> <artifactId>test-package</artifactId> <version>[1.1.0,)</version> <!-- Range selector to pick latest --> </dependency> </dependencies> </project>settings.xml
<settings> <servers> <server> <id>http-unblocker</id> <configuration> <httpHeaders> <property> <name>Private-Token</name> <value><PAT></value> </property> </httpHeaders> </configuration> </server> </servers> <mirrors> <mirror> <id>http-unblocker</id> <mirrorOf>gitlab-virtual</mirrorOf> <name></name> <url>http://gdk.test:3000/api/v4/virtual_registries/packages/maven/<REGISTRY_ID></url> <blocked>false</blocked> </mirror> </mirrors> </settings> -
In the root of the directory
mvn_consumer, runmvn clean compile -s ./settings.xml. This should pullv1.1.0of the artifactcom/example/test-packagefrom the GitLab's Maven Repository upstream and cache it in the GDK. -
Verify in rails console that the cache entries have been successfully created:
VirtualRegistries::Packages::Maven::Upstream.last.cache_entries -
Now we are going to publish a new version of the artifact
com/example/test-packageto GitLab's Maven Repository.-
In a new local directory named
mvn_publisher, create the following files (don't forget to fill the placeholders):pom.xml
<?xml version="1.0" encoding="UTF-8"?> <project xmlns="http://maven.apache.org/POM/4.0.0"> <modelVersion>4.0.0</modelVersion> <groupId>com.example</groupId> <artifactId>test-package</artifactId> <version>1.2.0</version> <packaging>jar</packaging> <distributionManagement> <repository> <id>gitlab-maven</id> <url>https://gitlab.com/api/v4/projects/60412753/packages/maven</url> </repository> </distributionManagement> </project>settings.xml
<settings> <servers> <server> <id>gitlab-maven</id> <configuration> <httpHeaders> <property> <name>Private-Token</name> <value><YOUR GITLAB (NOT GDK) PAT></value> </property> </httpHeaders> </configuration> </server> </servers> </settings>
-
-
In the root of the directory, run
mvn clean deploy -s ./settings.xml. This will publish a new versionv1.2.0of the packagecom/example/test-packageto the GitLab's repository.
Reproducing the bug on master
- Delete the compiled local package by running
rm -rf ~/.m2/repository/com/example/test-package. This is to make sure we always pull a fresh version of the artifact. - In the root of the directory
mvn_consumer, re-runmvn clean compile -s ./settings.xml. The expectation is that the virtual registry should fetch the latest uploaded versionv1.2.0of the packagecom/example/test-package, but because we don't hit the upstream (we satcache_validity_hourswith0), we don't get a fresh updated version of themaven-metadata.xmlfile, and the fetched version will bev1.1.0😞
Validating the fix on this branch
- Make sure you ran the migration to create the new column
metadata_cache_validity_hours - in rails console, we need to update the metadata file's
upstream_checked_atso that we can hit the upstream:VirtualRegistries::Packages::Maven::Upstream.last.default_cache_entries.find_by_relative_path("/com/example/test-package/maven-metadata.xml").update!(upstream_checked_at: 48.hours.ago) - Delete the compiled local package by running
rm -rf ~/.m2/repository/com/example/test-package. - In the root of the directory
mvn_consumer, re-runmvn clean compile -s ./settings.xml. This time, the latest versionv1.2.0of the packagecom/example/test-packageshould be installed successfully🎉
MR acceptance checklist
Evaluate this MR against the MR acceptance checklist. It helps you analyze changes to reduce risks in quality, performance, reliability, security, and maintainability.
Related to #556138 (closed)