Validate file integrity for Attachments, LFS objects and Job artifacts after transfer to secondary
If data is worth replicating, it is worth being validated.
Problem to solve
Geo is being used in a case where a node has an internet connection bad enough to corrupt file transfers over HTTP: https://gitlab.zendesk.com/agent/tickets/110423
This especially impacts Attachments, LFS objects, and Job artifacts since they are never checked at any point. We should ensure that data transferred via HTTP is checked.
Intended users
Users are customers where the internet connection is bad enough to corrupt file transfers.
Further details
N/A
Proposal
- [-] Generate a checksum at some point (e.g. just before transfer, or store one upon file creation)
-
Attachments and LFS objects already store checksums upon file creation (in a job if large). - [-] The only artifacts that already have checksums are complete job traces. => Opened issue to generate artifact checksums: https://gitlab.com/gitlab-org/gitlab-ee/issues/12743
-
-
Fail a transfer if the checksum does not match so it will be retried
Permissions and Security
N/A
Documentation
Testing
What does success look like, and how can we measure that?
- Attachments, LFS objects, and job artifacts are checksummed on secondary nodes after transfer, compared with the stored checksum, and rejected if mismatched.
- Ignore if checksum doesn't exist. (Large uploads are checksummed in a job after creation, and many types of artifacts don't calculate it at all https://gitlab.com/gitlab-org/gitlab-ee/issues/12743)
What is the type of buyer?
Links / references
/label feature
Edited by Michael Kozono