Remove unnecessary hash calculations from artifact uploads
During investigation of artifact upload performance bottlenecks, we identified that up to 30-35% of CPU time during artifact uploads is spent calculating hashes. Currently, Workhorse calculates 4 different hashes for each artifact, but only SHA256 is actually used by Rails for duplicate detection.
The other 3 hashes are not exposed publicly and their purpose is unclear. Removing unnecessary hash calculations could provide a modest performance improvement.
Next Steps:
- Identify which hashes are actually needed and used
- Remove unnecessary hash calculations from the artifact upload process
- Consider making hash calculations concurrent if multiple hashes are required
Related: Follows investigation in #527217 (closed)
Proposal
The hash functions that workhorse will use for a given upload are specified in workhorse_authorize with the UploadHashFunctions option. If none are specified (which is the default), all four will be generated. Currently the only time this changes is for FIPS mode, which doesn't use md5 hashes.
For the first iteration we can specify just sha256 for job artifact uploads (behind a feature flag). In later iterations we can investigate the same optimisation for other types of upload, and eventually have uploader classes specify only the hash functions required for that file type.