Allow "rake gitlab:uploads:check" to verify files in Object Storage
Problem to solve
It is valuable to have ways to assure sysadmins of data integrity, e.g. before migrating data stores.
After https://gitlab.com/gitlab-org/gitlab-ce/merge_requests/19501, we will have the ability to verify the existence of LFS object, artifact, and upload files in object storage.
It would be even better if we could verify the integrity of those files by comparing their checksums with checksums calculated upon upload to be sure the files have not been mutated.
Blocked on
After Optimize checksumming of Object Stored Uploads
is implemented, we can allow the rake task to verify files in Object Storage.
Proposal
What does success look like, and how can we measure that?
-
rake gitlab:uploads:check
compares each file's MD5 in the DB with a freshly calculated MD5 from the file in object store -
If there is no checksum in the DB, then I suppose we should store one and output that we did so (instead of returning a failure), since our codebase currently doesn't attempt to "ensure" that a checksum is calculated for all remote stored files -
Update docs https://gitlab.com/gitlab-org/gitlab-ce/blob/edc1e16f092abe3b02ba1d2b927510161db0b301/doc/administration/raketasks/check.md#uploaded-files-integrity
Links / references
Edited by Michael Kozono