Add remote checksums for Maven package registry and dependency proxy
🖐️ Context
Maven clients (such as $ mvn) don't work with a single file when interacting with registry for a package. Instead, they rely on a step of files (.pom, .jar, maven-metadata.xml for example).
For each file, integrity is provided by additional (related files). For example, .pom.md5 and .pom.sha1 files. Thus, for each file, maven clients will trigger the following web requests (example with a single .pom file):
/pkg.pom/pkg.pom.md5/pkg.pom.sha1
Now, not all 3 requests happen all the time. It depends on the running conditions for the given maven clients. However, it is very common to see either of the checksum being requested. Example.
During our work on maven virtual registries (feature to be interacted with maven clients), we stumbled upon this page. In a few words, we can include custom x-* http headers in the response for /pkg.pom and these headers will "transport" the sha1 and md5 checksum. Maven clients will read these headers and completely skip the requests for /pkg.pom.md5 and /pkg.pom.sha1. You can imagine that this leads into saving backend resources (cpu time and database requests saved).
⚔️ Designing the solution
The challenge we have here is that the /pkg.pom request is basically resolving to a file on object storage. As such, we have a few different configurations to handle:
- Object storage disabled. The file system is used.
- Object storage enabled.
- Proxy download enabled. The file is pulled from object storage by GitLab (workhorse) and sent back to the client.
- Proxy download disabled. GitLab answers a redirect to a signed url that points to the file on object storage directly. The client will follow that redirect to get the file.
Now, when it comes to setting custom x-* headers to the response for /pkg.pom, we have.
- Object storage disabled. We can do it.
- Object storage enabled.
- Proxy download enabled. We can do it.
- Proxy download disabled. Technically impossible as we are limited in how we can instruct object storage providers to send back specific headers.
Thus, for case (2.)(2.), we need to avoid it. The way we're going to do it is forcing the proxy download and thus be in case (2.)(1.).
On gitlab.com, the package registry is already using the proxy download (2.) (1.). Thus, this forcing proxy download would only impact self-managed users that disabled it for the package registry.
🤔 What does this MR do and why?
- When requesting a file in the Maven package registry, set the
x-*headers to send back the checksums along with the file.- The proxy download is forced to be enabled if necessary. Eg.
proxy_download: falsefor the Maven package registry is no longer possible.
- The proxy download is forced to be enabled if necessary. Eg.
- The helper changed in this MR will also impact the Maven dependency proxy which has a similar behavior (returning files to maven clients). There, the object storage configuration of the package registry is used, thus, we have the exact same situation.
- Adjust the related specs.
The Maven package registry being one of the top most used packages registry on gitlab.com, this change is gated behind a feature flag to provide an additional safety net during its deployment.
References
- Maven Package Registry: implement remote includ... (#507768 - closed) • David Fernandez • 17.8
- https://docs.gitlab.com/ee/user/packages/maven_repository/
- https://docs.gitlab.com/ee/user/packages/package_registry/dependency_proxy/#for-maven-packages
- https://maven.apache.org/resolver/expected-checksums.html#remote-included-checksums
🏁 MR acceptance checklist
Please evaluate this MR against the MR acceptance checklist. It helps you analyze changes to reduce risks in quality, performance, reliability, security, and maintainability.
🦄 Screenshots or screen recordings
No UI changes.
⚙️ How to set up and validate locally
Have:
- a project ready.
- PAT (
apiscope) ready (maintainer level on the project). - A working
$ mvninstallation.
1️⃣ Maven package registry
- Publish a package
My.Dependency1.3.7to the project. - Create a local folder with these files:
pom.xml
<?xml version="1.0" encoding="UTF-8" ?>
<project
xmlns="http://maven.apache.org/POM/4.0.0"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/maven-v4_0_0.xsd"
>
<modelVersion>4.0.0</modelVersion>
<groupId>test</groupId>
<artifactId>test</artifactId>
<version>1.2.3</version>
<dependencies>
<dependency>
<groupId>gl.pru</groupId>
<artifactId>My.Dependency</artifactId>
<version>1.3.7</version>
</dependency>
</dependencies>
<repositories>
<repository>
<id>gitlab-maven</id>
<url>http://gdk.test:8000/api/v4/projects/<project_id>/packages/maven</url>
</repository>
</repositories>
</project>
settings.xml
<settings
xmlns="http://maven.apache.org/SETTINGS/1.0.0"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation="http://maven.apache.org/SETTINGS/1.0.0
http://maven.apache.org/xsd/settings-1.0.0.xsd"
>
<mirrors>
<mirror>
<id>maven-default-http-blocker</id>
<url>http://127.0.0.1/dont-go-here</url>
<mirrorOf>dummy</mirrorOf>
<blocked>false</blocked>
</mirror>
</mirrors>
<servers>
<server>
<id>gitlab-maven</id>
<configuration>
<httpHeaders>
<property>
<name>Private-Token</name>
<value>***PAT TOKEN HERE***</value>
</property>
</httpHeaders>
</configuration>
</server>
</servers>
</settings>
Remove the maven cache for the dependency package:
$ rm -rf ~/.m2/repository/gl/pru
Let's install the dependencies:
$ mvn install -s settings.xml
On master
With $ tail -f log/development.log | grep "api/v4/projects/<project_id>/packages/maven", we get
Started GET "/api/v4/projects/22/packages/maven/gl/pru/My.Dependency/1.3.7/My.Dependency-1.3.7.pom" for 172.16.123.1 at 2024-12-17 11:28:52 +0100
Started GET "/api/v4/projects/22/packages/maven/gl/pru/My.Dependency/1.3.7/My.Dependency-1.3.7.pom.sha1" for 172.16.123.1 at 2024-12-17 11:28:52 +0100
Started GET "/api/v4/projects/22/packages/maven/gl/pru/My.Dependency/1.3.7/My.Dependency-1.3.7.jar" for 172.16.123.1 at 2024-12-17 11:28:52 +0100
Started GET "/api/v4/projects/22/packages/maven/gl/pru/My.Dependency/1.3.7/My.Dependency-1.3.7.jar.sha1" for 172.16.123.1 at 2024-12-17 11:28:53 +0100
We can see that the file is requested and the sha1 checksum = 4 requests.
With this MR
Started GET "/api/v4/projects/22/packages/maven/gl/pru/My.Dependency/1.3.7/My.Dependency-1.3.7.pom" for 172.16.123.1 at 2024-12-17 11:30:42 +0100
Started GET "/api/v4/projects/22/packages/maven/gl/pru/My.Dependency/1.3.7/My.Dependency-1.3.7.jar" for 172.16.123.1 at 2024-12-17 11:30:43 +0100
As you can see, we only have requests to the file themselves. No more requests to checksums because they have been included in the response along with the file. Only 2 requests (50% reduction)
2️⃣ Maven dependency proxy
In the project settings,
-
go to
Packages and registries -
enable the dependency proxy
-
set the url to
https://repo1.maven.org/maven2/ -
save the changes
-
Create a local folder with these files:
pom.xml
<?xml version="1.0" encoding="UTF-8" ?>
<project
xmlns="http://maven.apache.org/POM/4.0.0"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/maven-v4_0_0.xsd"
>
<modelVersion>4.0.0</modelVersion>
<groupId>test</groupId>
<artifactId>test</artifactId>
<version>1.2.3</version>
<dependencies>
<dependency>
<groupId>org.junit.jupiter</groupId>
<artifactId>junit-jupiter-api</artifactId>
<version>5.11.4</version>
<scope>test</scope>
</dependency>
</dependencies>
<repositories>
<repository>
<id>gitlab-maven</id>
<url>http://gdk.test:8000/api/v4/projects/<project_id>/dependency_proxy/packages/maven</url>
</repository>
</repositories>
</project>
settings.xml
<settings
xmlns="http://maven.apache.org/SETTINGS/1.0.0"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation="http://maven.apache.org/SETTINGS/1.0.0
http://maven.apache.org/xsd/settings-1.0.0.xsd"
>
<mirrors>
<mirror>
<id>maven-default-http-blocker</id>
<url>http://127.0.0.1/dont-go-here</url>
<mirrorOf>dummy</mirrorOf>
<blocked>false</blocked>
</mirror>
</mirrors>
<servers>
<server>
<id>gitlab-maven</id>
<configuration>
<httpHeaders>
<property>
<name>Private-Token</name>
<value>***PAT TOKEN HERE***</value>
</property>
</httpHeaders>
</configuration>
</server>
</servers>
</settings>
Remove the maven cache for the junit package:
$ rm -rf ~/.m2/repository/org/junit
Let's install the dependencies:
$ mvn install -s settings.xml
On master
With $ tail -f log/development.log | grep "api/v4/projects/<project_id>/dependency_proxy/packages/maven", we get
Started GET "/api/v4/projects/22/dependency_proxy/packages/maven/org/junit/jupiter/junit-jupiter-api/5.11.4/junit-jupiter-api-5.11.4.pom" for 172.16.123.1 at 2024-12-17 11:51:59 +0100
Started GET "/api/v4/projects/22/dependency_proxy/packages/maven/org/junit/jupiter/junit-jupiter-api/5.11.4/junit-jupiter-api-5.11.4.pom.sha1" for 172.16.123.1 at 2024-12-17 11:52:00 +0100
Started GET "/api/v4/projects/22/dependency_proxy/packages/maven/org/junit/junit-bom/5.11.4/junit-bom-5.11.4.pom" for 172.16.123.1 at 2024-12-17 11:52:00 +0100
Started GET "/api/v4/projects/22/dependency_proxy/packages/maven/org/junit/junit-bom/5.11.4/junit-bom-5.11.4.pom.sha1" for 172.16.123.1 at 2024-12-17 11:52:01 +0100
Started GET "/api/v4/projects/22/dependency_proxy/packages/maven/org/junit/platform/junit-platform-commons/1.11.4/junit-platform-commons-1.11.4.pom" for 172.16.123.1 at 2024-12-17 11:52:01 +0100
Started GET "/api/v4/projects/22/dependency_proxy/packages/maven/org/junit/platform/junit-platform-commons/1.11.4/junit-platform-commons-1.11.4.pom.sha1" for 172.16.123.1 at 2024-12-17 11:52:01 +0100
Started GET "/api/v4/projects/22/dependency_proxy/packages/maven/org/junit/jupiter/junit-jupiter-api/5.11.4/junit-jupiter-api-5.11.4.jar" for 172.16.123.1 at 2024-12-17 11:52:01 +0100
Started GET "/api/v4/projects/22/dependency_proxy/packages/maven/org/junit/jupiter/junit-jupiter-api/5.11.4/junit-jupiter-api-5.11.4.jar.sha1" for 172.16.123.1 at 2024-12-17 11:52:01 +0100
Started GET "/api/v4/projects/22/dependency_proxy/packages/maven/org/junit/platform/junit-platform-commons/1.11.4/junit-platform-commons-1.11.4.jar" for 172.16.123.1 at 2024-12-17 11:52:01 +0100
Started GET "/api/v4/projects/22/dependency_proxy/packages/maven/org/junit/platform/junit-platform-commons/1.11.4/junit-platform-commons-1.11.4.jar.sha1" for 172.16.123.1 at 2024-12-17 11:52:02 +0100
We can see that the file is requested and the sha1 checksum = 10 requests.
(more than our dependency was pulled because that dependency had "further" dependencies that needed to be pulled = more files to pull)
With this MR
Started GET "/api/v4/projects/22/dependency_proxy/packages/maven/org/junit/jupiter/junit-jupiter-api/5.11.4/junit-jupiter-api-5.11.4.pom" for 172.16.123.1 at 2024-12-17 11:47:58 +0100
Started GET "/api/v4/projects/22/dependency_proxy/packages/maven/org/junit/junit-bom/5.11.4/junit-bom-5.11.4.pom" for 172.16.123.1 at 2024-12-17 11:47:59 +0100
Started GET "/api/v4/projects/22/dependency_proxy/packages/maven/org/junit/platform/junit-platform-commons/1.11.4/junit-platform-commons-1.11.4.pom" for 172.16.123.1 at 2024-12-17 11:47:59 +0100
Started GET "/api/v4/projects/22/dependency_proxy/packages/maven/org/junit/jupiter/junit-jupiter-api/5.11.4/junit-jupiter-api-5.11.4.jar" for 172.16.123.1 at 2024-12-17 11:48:00 +0100
Started GET "/api/v4/projects/22/dependency_proxy/packages/maven/org/junit/platform/junit-platform-commons/1.11.4/junit-platform-commons-1.11.4.jar" for 172.16.123.1 at 2024-12-17 11:48:00 +0100
As you can see, we only have requests to the file themselves. No more requests to checksums because they have been included in the response along with the file. Only 5 requests (again, 50% reduction)
🔮 Conclusions
As we saw above, we can see that this MR can have a pretty large impact on the amount of web requests triggered by Maven clients.