Skip to content

Geo: Generate job artifact checksums

Follow up for https://gitlab.com/gitlab-org/gitlab-ee/issues/8921 / https://gitlab.com/gitlab-org/gitlab-ee/merge_requests/14477 which validates checksums after transfer.

Problem

Many/most artifacts don't have a checksum. Checksums are only generated for traces when they are archived. So the secondary is unable to validate these artifacts after transfer.

Proposal

Checksum all artifacts on creation.

Take a cue from the Upload model. If the file is large (> 100MB), do it asynchronously. In this case, small artifacts would get verified on transfer. Large artifacts might not be verified on transfer. This is ok, since we'll be implementing constant reverification within a milestone or two, so large artifacts will be verified eventually.

For reference, on my Macbook Pro I checksummed (SHA256) a 1.2GB file in 5s, and a 5GB file in 8s.

Question: Is it insufficient to checksum "on_create" of the record, since e.g. it sounds like traces mutate until they are archived?

Close without doing anything proposal

As mentioned above,

we'll be implementing constant reverification within a milestone or two

So all artifacts will be verified by that process, eventually. Not perfect, but maybe good enough.

Edited by Michael Kozono