Skip to content

Maven package registry returning 409 when uploading the sha1

🔥 Problem

From https://gitlab.com/gitlab-com/ops-sub-department/section-ops-request-for-help/-/issues/6.

Maven packages are not uploaded in a single step. Instead, multiple files are uploaded in a sequence. We can see such sequence documented here.

Among the possible files uploaded, the sha1 digest can be uploaded. We can see here that:

  • Those uploads are ignored = a package file is not created = nothing is stored on object storage.
  • (for sha1) instead we simply locate the related file and check for coherence between the two signatures (the one that we stored along with the related file and the one that is being uploaded).
    • When a file is uploaded to GitLab (through Workhorse), the sha1 is automatically computed and stored. That is why for maven packages, we ignore the digest uploads.

From this search, it seems that the coherence check fails with 409 Conflict. We have about 300 hits per week on gitlab.com.

One thing to note from the search, this failure seems to happen randomly on file types. Sometimes is the sha1 for the jar file, sometimes it's for the xml file.

🚒 Solution

My gut feeling is telling me that we're hitting this issue because of the database replication lag. Maven clients upload the file and its sha1 digest in a row. Because the sha1 upload is ignored and simply read the related file (from the first upload), this will be routed to the replica. If the replica is lagging behind the primary database, the stored fingerprint could be wrong and we would hit conflict! when comparing it with the received one.

Before diving in a fix for this (forcing reading from the primary when receiving a fingerprint upload), I'd like first to confirm that this is the root cause.

As such, I suggest:

  1. Given the number of occurences happening per week, add logging around here. Log:
    • the received sha1.
    • the stored sha1.
    • the output of hexdigest the stored sha1.
    • log only when there is a conflict.
  2. Let the logs run for a few days and analyze the situation.
  3. If the stored sha1 is always wrong, this points to a replica lag issue.
Edited by David Fernandez