Remove unnecessary hash calculations for job artifact uploads
What does this MR do and why?
Remove unnecessary hash calculations for job artifact uploads
By default Workhorse generates four different hashes for uploaded files (three in FIPS-enabled environments), which adds considerable overhead for larger uploads. For job artifacts, only one of these is required by the application (sha256), which is used for deduplication.
This change specifies only the functions required in the UploadHashFunctions
passed to Workhorse, preventing hashes being calculated when they aren't
used.
Additional context
Files are passed through to this service from the API, and we can see that when creating the database records for both the metadata file and the aftifact itself we only store the sha256 value. The deduplication logic is in the same class, and checks these stored sha256 values. There's no reference to any of the other functions here or in the API action, and if they aren't stored in the database record it should be safe to assume they aren't required for anything outside of the upload call itself as they would no longer be accessible once the upload is complete.
References
Screenshots or screen recordings
| Before | After |
|---|---|
How to set up and validate locally
- Enable object storage in the GDK.
- Create a
.gitlab-ci.ymlthat uploads an artifact, for example:job: script: - dd if=/dev/random of=example-artifact1 bs=512k count=100 artifacts: paths: - example-artifact1 - Run the job and ensure the artifact is uploaded, and can be downloaded from the job UI.
MR acceptance checklist
Evaluate this MR against the MR acceptance checklist. It helps you analyze changes to reduce risks in quality, performance, reliability, security, and maintainability.