Geo: Generate job artifact checksums

Follow up for https://gitlab.com/gitlab-org/gitlab-ee/issues/8921 / https://gitlab.com/gitlab-org/gitlab-ee/merge_requests/14477 which validates checksums after transfer.

Problem

Many/most artifacts don't have a checksum. Checksums are only generated for traces when they are archived. So the secondary is unable to validate these artifacts after transfer.

Proposal

Checksum all artifacts on creation.

Take a cue from the Upload model. If the file is large (> 100MB), do it asynchronously. In this case, small artifacts would get verified on transfer. Large artifacts might not be verified on transfer. This is ok, since we'll be implementing constant reverification within a milestone or two, so large artifacts will be verified eventually.

For reference, on my Macbook Pro I checksummed (SHA256) a 1.2GB file in 5s, and a 5GB file in 8s.

Question: Is it insufficient to checksum "on_create" of the record, since e.g. it sounds like traces mutate until they are archived?

Close without doing anything proposal

As mentioned above,

we'll be implementing constant reverification within a milestone or two

So all artifacts will be verified by that process, eventually. Not perfect, but maybe good enough.

Edited Jul 20, 2019 by Michael Kozono
Assignee Loading
Time tracking Loading