Skip to content

Add logging when there is a maven sha1 conflict

David Fernandez requested to merge 367356-add-logging into master

🔍 Context

Maven packages are not uploaded as a single file (archive file). Instead multiple files are sent and this whole set of files is what is needed to make the maven package available in the package registry.

You can have a glimpse of which files are sent in this code comment.

Among other things, maven clients will send a file A and then its signatures: sha1 and md5.

Now, when file A is sent, this upload is accelerated by Workhorse. As such, workhorse will pass the uploaded file along with metadata such as the sha1 signature. We store that signature when creating the Packages::PackageFile database row.

When the sha1 sig of file A is sent, we will not store the file because we already have the sha1 sig stored from the previous request. Instead, the backend will simply verify that both sha1 (the one that is store and the one that is being sent) are the same. If not, a 429 Conflict is returned because something is wrong.

In #367356 (closed), we discovered that we have a few 429s per week on gitlab.com. Our current lead for the root cause is the replica lag.

When the sha1 of file A is sent, we don't write anything to the database. As such, it's only reads = the replica could be used. The problem we might have here is that maven clients send file A and the sha1 sig in quick succession. So quick that when trying to compare with the stored sha1, we don't have anything to compare to = the comparison check fails and the 429 is returned.

Before diving head first in a fix for this, we want first to confirm that we have such replication lag issue. For this, this MR will simply log the stored sha1 and the sent sha1 right before the 429 is sent. Given the low amount of 429 per week on gitlab.com, this is reasonable enough.

What does this MR do and why?

  • Log the stored and sent sig for the maven package file if a conflict is detected.
  • Update the related spec.

🖥 Screenshots or screen recordings

n / a

🏁 How to set up and validate locally

I think that it is quite challenging to make maven clients send wrong sha1 clients. So to test this MR, we're going to use something else: modify the backend so that the conflict is always detected.

  1. Update this line to:
    • if stored_sha256 != expected_sha256
  2. Send a maven package to your GitLab instance with this utility.
    • You will need a Personal Access Token.
  3. Look at the logs, you should see something like:
    {
      "severity": "ERROR",
      "exception.class": "ArgumentError",
      "exception.message": "ArgumentError",
      "tags.feature_category": "package_registry",
      "extra.message": "maven package file sha1 conflict",
      "extra.stored_sha1": "78f664f030d2a684f59081e88b9461257e859c14\n",
      "extra.received_sha256": "961f05f184c8832d0346ab6ea70d20925a4ab42255241965a77109cc6eb12ea8",
      "extra.sha256_hexdigest_of_stored_sha1": "961f05f184c8832d0346ab6ea70d20925a4ab42255241965a77109cc6eb12ea8",
      // ... other fields
    }

🚥 MR acceptance checklist

This checklist encourages us to confirm any changes have been analyzed to reduce risks in quality, performance, reliability, security, and maintainability.

Edited by David Fernandez

Merge request reports