Database race condition when uploading maven packages
Summary
When using official maven images tagged with eclipse-temurin
(e.g. maven:3.9.3-eclipse-temurin-11
) it is possible to encounter a race condition when uploading packages, ultimately resulting in the following output within the job and a {"message":"Validation failed: Name has already been taken"}
error within our logs:
Could not transfer artifact calebw:maven-test2:pom:0.59.40 from/to gitlab-maven (https://gitlab.com/api/v4/projects/12345678/packages/maven): status code: 400, reason phrase: Bad Request (400)
This seems to only occur when both of the following conditions are true:
- An official maven image tagged with
eclipse-temurin
is used - The flag
-DdeployAtEnd=true
is passed as a Maven argument, which configures all the packages to be pushed at the end.
After speaking with @10io, it seems like this is occurring due to how these specific images send the requests when the above conditions are met. It looks like we receive two authorize requests sequentially to the /authorize
endpoint (one for the pom, one for the jar) before a request to upload either one of these files is made.
Steps to reproduce
Fork this project and run a pipeline. This project utilizes predefined CI variables and should need no configuration in order to build and attempt to publish packages to it's relevant package registry.
This project has a parent/child pom configuration that publishes multiple packages. You will see the job fail with the above error on a random package, and it will fail when attempting to publish either the POM or JAR file. Removing the -DdeployAtEnd=true
flag from the .gitlab-ci.ym
or changing the image to maven:3.9.3
will allow the job to succeed.
Relevant logs can be found in Kibana after a failed attempt is made by searching on the following: json.path: /api/v4/projects/PROJECT_ID/packages/maven/NAMESPACE/maven-test/*
.
Example Project
https://gitlab.com/calebw/mvn-400-bad-request
What is the current bug behavior?
Deploy attempts will fail with a 400 bad request
output in the job, and a {"message":"Validation failed: Name has already been taken"}
error on the backend.
What is the expected correct behavior?
All packages are deployed without issue.
Relevant logs and/or screenshots
Output of checks
This bug happens on GitLab.com
Possible fixes
Implement support for parallel uploads. This could be challenging but we could have some leads:
Use upsert this way, the database itself will take care of the race condition. The challenge here is that we need to have the proper unique constraints in place. Use an exclusive lease on the backend so that second threads wait for the first one to finish. Not ideal but could work.