x86(_64): CPU feature detection broken
If you try to run GnuTLS on a Linux kernel with the noxsave
command-line parameter set, it will (on CPUs with AVX2) be terminated with SIGILL
at the verzoupper
instruction in https://gitlab.com/gnutls/gnutls/-/blob/9571f3a9e202ca2eeb369bb320bb93b638bb718c/lib/accelerated/x86/elf/sha256-ssse3-x86_64.s#L4241.
The reason is that _gnutls_x86_cpuid_s
is not calculated like sha256_block_data_order
expects it to be.
In OpenSSL, OPENSSL_ia32cap_P[4]
is essentially = {CPUID.1:EDX, CPUID.1:ECX, CPUID.7:EBX, CPUID.7:ECX}
but with some heavy modifications (in assembly) done afterwards: https://github.com/openssl/openssl/blob/d5d95daba59adc41ab60ea86acd513f255fca3c0/crypto/x86_64cpuid.pl#L73. There is a more readable C version of the same code with explanations in BoringSSL: https://github.com/google/boringssl/blob/bb88f52261f3231005c7fa43e55cc888d2f9f582/include/openssl/cpu.h#L75, https://github.com/google/boringssl/blob/bb88f52261f3231005c7fa43e55cc888d2f9f582/crypto/cpu-intel.c#L154.
Bugs in GnuTLS
-
read_cpuid_vals()
mixes up CPUID.1:EDX with CPUID.1:EBX, so that_gnutls_x86_cpuid_s[0] = CPUID.1:EBX
instead of= CPUID.1:EDX
. -
read_cpuid_vals()
does neither check the OSXSAVE bit nor does it apply the other modifications done by the upstream code, e.g., it does not set_gnutls_x86_cpuid_s[0] & (1 << 30)
on Intel CPUs (this bit originally was the "IA64 processor emulating x86" bit, is currently reserved (0) on current Intel CPUs and is (ab)used by the upstream code to indicate any Intel CPU). -
This results in
sha256_block_data_order
(and other functions) not following Intel's specified way to check for AVX(2/-512) support (i.e., check the OSXSAVE bit first): https://www.intel.com/content/dam/develop/external/us/en/documents-tps/325462-sdm-vol-1-2abcd-3abcd.pdf#page=353. If a CPU supports AVX(2/-512) but the operating system does not (e.g., Linux withnoxsave
), this will, thus, cause SIGILL. -
Not setting the Intel CPU bit (even if
read_cpuid_vals()
was fixed to set_gnutls_x86_cpuid_s[0] = CPUID.1:EDX
), probably results in the AVX (without AVX2) code path's never being taken (https://gitlab.com/gnutls/gnutls/-/blob/9571f3a9e202ca2eeb369bb320bb93b638bb718c/lib/accelerated/x86/elf/sha256-ssse3-x86_64.s#L57 checks_gnutls_x86_cpuid_s[0]
for1073741824 == 1 << 30
).