[x86_64] Assembly version of SHA1Transform added to hash package
Summary
This merge request builds on !485 (merged) to provide an assembly version of the SHA-1 hash algorithm in the hash
package.
What is the current bug behavior?
N/A
What is the behavior after applying this patch?
New assembly version of the SHA-1 algorithm that should be significantly faster than the Pascal fallback, as well as providing cleaner expansion options for other platform-specific implementations.
Relevant logs and/or screenshots
Running a timing test under x86_64-win64 - before (Pascal):
SHA1(Hello): 473 ns/call
SHA1(World x 1000): 26792 ns/call
After (x86_64 assembly language):
SHA1(Hello): 223 ns/call
SHA1(World x 1000): 8119 ns/call
Additional notes
Despite attempts to speed things up with SIMD instructions, instruction rearranging, utilising more registers for some of the data expansion, and merging rounds together (the last 6 rounds do calculate some of the data expansion 2 rounds at a time, using the full 64-bit general purpose registers), performance was overall worse due to additional overhead.