x86 SSE/SSE2/SSE3 instruction semantic bugs with NaN

Host environment

Operating system: Windows 10 20H2
OS/kernel version: WSL2 Ubuntu 20.04.4 LTS (GNU/Linux 5.10.102.1-microsoft-standard-WSL2 x86_64)
Architecture: x86
QEMU flavor: qemu-x86_64
QEMU version: 7.1.90 (v7.2.0-rc0)
QEMU command line: qemu-x86_64 -cpu max a.out

Emulated/Virtualized environment

Operating system: None
OS/kernel version: None
Architecture: x86

Description of problem

The result of SSE/SSE2/SSE3 instructions with NaN is different from the CPU. From Intel manual Volume 1 Appendix D.4.2.2, they defined the behavior of such instructions with NaN. But I think QEMU did not implement this semantic exactly because the byte result is different.

Steps to reproduce

Compile this code

void main() {
    asm("mov rax, 0x000000007fffffff; push rax; mov rax, 0x00000000ffffffff; push rax; movdqu XMM1, [rsp];");
    asm("mov rax, 0x2e711de7aa46af1a; push rax; mov rax, 0x7fffffff7fffffff; push rax; movdqu XMM2, [rsp];");
    asm("addsubps xmm1, xmm2");
}

Execute and compare the result with the CPU. This problem happens with other SSE/SSE2/SSE3 instructions specified in the manual, Volume 1 Appendix D.4.2.2.
- CPU
  - xmm1[3] = 0xffffffff
- QEMU
  - xmm1[3] = 0x7fffffff

Additional information

This bug is discovered by research conducted by KAIST SoftSec.

To upload designs, you'll need to enable LFS and have an admin enable hashed storage. More information