Remove unnecessary hash calculations for job artifact uploads

What does this MR do and why?

Remove unnecessary hash calculations for job artifact uploads

By default Workhorse generates four different hashes for uploaded files (three in FIPS-enabled environments), which adds considerable overhead for larger uploads. For job artifacts, only one of these is required by the application (sha256), which is used for deduplication.

This change specifies only the functions required in the UploadHashFunctions passed to Workhorse, preventing hashes being calculated when they aren't used.

Additional context

Files are passed through to this service from the API, and we can see that when creating the database records for both the metadata file and the aftifact itself we only store the sha256 value. The deduplication logic is in the same class, and checks these stored sha256 values. There's no reference to any of the other functions here or in the API action, and if they aren't stored in the database record it should be safe to assume they aren't required for anything outside of the upload call itself as they would no longer be accessible once the upload is complete.

References

#584058

Screenshots or screen recordings

Before After

How to set up and validate locally

  1. Enable object storage in the GDK.
  2. Create a .gitlab-ci.yml that uploads an artifact, for example:
     job:
       script:
         - dd if=/dev/random of=example-artifact1 bs=512k count=100
       artifacts:
         paths:
           - example-artifact1
  3. Run the job and ensure the artifact is uploaded, and can be downloaded from the job UI.

MR acceptance checklist

Evaluate this MR against the MR acceptance checklist. It helps you analyze changes to reduce risks in quality, performance, reliability, security, and maintainability.

Edited by Hordur Freyr Yngvason

Merge request reports

Loading