Allow "rake gitlab:uploads:check" to verify files in Object Storage

Problem to solve

It is valuable to have ways to assure sysadmins of data integrity, e.g. before migrating data stores.

After https://gitlab.com/gitlab-org/gitlab-ce/merge_requests/19501, we will have the ability to verify the existence of LFS object, artifact, and upload files in object storage.

It would be even better if we could verify the integrity of those files by comparing their checksums with checksums calculated upon upload to be sure the files have not been mutated.

Blocked on

After Optimize checksumming of Object Stored Uploads is implemented, we can allow the rake task to verify files in Object Storage.

Proposal

What does success look like, and how can we measure that?

  • rake gitlab:uploads:check compares each file's MD5 in the DB with a freshly calculated MD5 from the file in object store
  • If there is no checksum in the DB, then I suppose we should store one and output that we did so (instead of returning a failure), since our codebase currently doesn't attempt to "ensure" that a checksum is calculated for all remote stored files
  • Update docs https://gitlab.com/gitlab-org/gitlab-ce/blob/edc1e16f092abe3b02ba1d2b927510161db0b301/doc/administration/raketasks/check.md#uploaded-files-integrity

Links / references

Edited Aug 26, 2019 by Michael Kozono
Assignee Loading
Time tracking Loading