RISC-V Vector Slowdowns
[It's probably best to triage these and file some more focused bugs. but I figured I'd just open it rather than forget about it.]
I've seen a few reports of RISC-V binaries that have been optimized for the vector extension running much slower (in QEMU, softmmu/TCG) than the non-vectorized versions. This hits us on some testsuite CI runs, which can take quite a long time. These bugs have been filtering in over the last year or so (we've been making a big push for autovec in both GCC and LLVM lately).
Here's a list of reproducers so far:
- https://gist.github.com/compnerd/daa7e68f7b4910cb6b27f856e6c2beba runs >100x slower (73ms -> 13s) when compiled with autovectorization.
- A handful of dav1d routines that run faster on hardware run slower on QEMU, some by factors of 2x: https://code.videolan.org/videolan/dav1d/-/commit/219befef.
I've proposed this as a GSoC/Outreachy project, see https://lore.kernel.org/all/CAKmqyKMAQ1vrf9QnCx17DbKgGTqgDd58y46RLwZvzW4Sk4zyjA@mail.gmail.com/.
Edited by Palmer Dabbelt