Geo: Generate job artifact checksums
Follow up for https://gitlab.com/gitlab-org/gitlab-ee/issues/8921 / https://gitlab.com/gitlab-org/gitlab-ee/merge_requests/14477 which validates checksums after transfer.
Problem
Many/most artifacts don't have a checksum. Checksums are only generated for traces when they are archived. So the secondary is unable to validate these artifacts after transfer.
Proposal
Checksum all artifacts on creation.
Take a cue from the Upload model. If the file is large (> 100MB), do it asynchronously. In this case, small artifacts would get verified on transfer. Large artifacts might not be verified on transfer. This is ok, since we'll be implementing constant reverification within a milestone or two, so large artifacts will be verified eventually.
For reference, on my Macbook Pro I checksummed (SHA256) a 1.2GB file in 5s, and a 5GB file in 8s.
Question: Is it insufficient to checksum "on_create" of the record, since e.g. it sounds like traces mutate until they are archived?
Close without doing anything proposal
As mentioned above,
we'll be implementing constant reverification within a milestone or two
So all artifacts will be verified by that process, eventually. Not perfect, but maybe good enough.