Some x86_64 SSE operations have incorrect/erratic behaviours
Setup
The issue has been tested on Qemu
- tags/v5.2.0
- tags/v6.0.0
Machine:
Linux ubuntu 5.8.0-55-generic 20.04.1-Ubuntu SMP x86_64
Comand line:
qemu-x86_64 -cpu max ./sse_test
(the source code of test_sse is attached)
Issue details
Some x86 SSE operations implemented in `target/i386/op_sse.h' does not behave correctly.
The issue can be reproduced using the attached test code sse_test.c.
It demonstrates the issue using the opcode pshufb.
When running outside Qemu we get the following output from the tests program
user@ubuntu:~$ ./test_sse
Test for SSE operations issue in Qemu v6.0.0.
0x7878787878787878 0x7878787878787878 0x0 0x0
0x7878787878787878 0x7878787878787878
Under Qemu we have:
user@ubuntu:~$ qemu/build/x86_64-linux-user/qemu-x86_64 -cpu max test_sse
Test for SSE operations issue in Qemu v6.0.0.
0x7878787878787878 0x7878787878787878 0x30 0x40000052a0
0x7878787878787878 0x7878787878787878
Note: the observed value may not be the same on your machine (it comes from an uninitialised variable).
Root cause
Some SSE helpers defined inside op_sse.h use uninitialised stack variables as local ZMM registers.
At the end of these helpers, this local is copied into the destination register, regardless of its size.
If only part of the local ZMM register has been used, then initialised stack memory will be copied into the destination register.
Example: pshufb helper
/* SSSE3 op helpers */
void glue(helper_pshufb, SUFFIX)(CPUX86State *env, Reg *d, Reg *s)
{
int i;
Reg r; // <<<--- uninitialised stack var (Reg resolves to ZMMReg)
for (i = 0; i < (8 << SHIFT); i++) {. //<<<--- This loop may not access all fields of the Reg structure
r.B(i) = (s->B(i) & 0x80) ? 0 : (d->B(s->B(i) & ((8 << SHIFT) - 1)));
}
*d = r; // <<<--- Copy stack partially initialised/modified local var r to d
}
Fix
I'm not sure how to fix this with a minimal impact on performance.
I guess that defining a dedicated helper for each register size can be a good way to fix this.
However, this may require modifications in i386/translate.c, which I currently do not fully understand.