SIGILL on vectorized instructions accessing CXL Type 3 memory with KVM

Host environment

  • Operating system: Ubuntu 22.04.5 LTS
  • OS/kernel version: 6.8.0-60-generic #63~22.04.1-Ubuntu SMP PREEMPT_DYNAMIC Tue Apr 22 19:00:15 UTC 2 x86_64 x86_64 x86_64 GNU/Linux
  • Architecture: x86_64
  • QEMU flavor: qemu-system-x86_64
  • QEMU version: v10.1.0-rc3
  • QEMU command line:
    qemu-system-x86_64 \
        -cpu host \
        -kernel bzImage \
        -drive file=rootfs.img,index=0,media=disk,format=raw \
        -append "root=/dev/sda2 rw console=ttyS0,115200 ignore_loglevel nokaslr default_hugepagesz=1G hugepagesz=1G hugepages=4 pci=nocrs" \
        -smp 4 \
        -serial mon:stdio \
        -nographic \
        -netdev user,id=network0,hostfwd=tcp::12345-:22 \
        -device e1000,netdev=network0 \
        -machine q35,cxl=on \
        -m 48G,slots=4,maxmem=64G \
        -D qemu.log \
        -accel kvm \
        -object memory-backend-ram,id=vmem0,share=on,size=32G \
        -object memory-backend-file,id=cxl-lsa0,share=on,mem-path=lsa.raw,size=256M \
        -device pxb-cxl,bus_nr=12,bus=pcie.0,id=cxl.1 \
        -device cxl-rp,port=0,bus=cxl.1,id=root_port13,chassis=0,slot=2 \
        -device cxl-type3,bus=root_port13,volatile-memdev=vmem0,lsa=cxl-lsa0,id=cxl-vmem0 \
        -M cxl-fmw.0.targets.0=cxl.1,cxl-fmw.0.size=64G

Emulated/Virtualized environment

  • Operating system: same as host
  • OS/kernel version: same as host
  • Architecture: same as host

Description of problem

In a QEMU guest with KVM acceleration enabled, an illegal instruction (SIGILL) occurs during vectorized load/store instructions on CXL Type 3 memory.

Issue scenarios:

  1. DAX device access
    • CXL Type 3 memory is exposed to the guest as a devdax device (/dev/dax0.0).
    • When the application mmaps this device and performs memory operations, SIGILL occurs when glibc’s optimized memory functions execute vectorized instructions.
  2. NUMA node access
    • CXL Type 3 memory is attached to the guest as a NUMA node instead of via devdax.
    • Even trivial programs that do not explicitly use SIMD instructions crash during runtime loader initialization before main().
    • This appears to happen because numactl’s binding to CXL memory causes vectorized accesses.

Steps to reproduce

  1. DAX device scenario

    1. Start QEMU with the above command line configuration.
    2. In the guest, attach CXL memory as a devdax device.
    3. In the guest, compile and run the following program.
      #include <stdio.h>
      #include <stdlib.h>
      #include <string.h>
      #include <time.h>
      #include <fcntl.h>
      #include <sys/mman.h>
      #include <unistd.h> 
      
      int main() {
          int fd = open("/dev/dax0.0", O_RDWR);
          size_t size = 2 * 1024 * 1024; // 2 MiB
          int* buffer = mmap(NULL, size, PROT_READ | PROT_WRITE, MAP_SHARED, fd, 0);
          int* dest = malloc(size);
          memcpy(dest, buffer, 1024);
          int cmp_result = memcmp(dest, buffer, 1024); // Compare first 1024 bytes
          free(dest);
          close(fd);
          return 0;
      }
    4. SIGILL occurs in memcmp.
      • This is an AVX instruction (vmovdqu) which loads unaligned 256-bit data from memory.
         0x00007ffff7da0944 <+4>:     mov    %rdi,%rax
         0x00007ffff7da0947 <+7>:     cmp    $0x20,%rdx
         0x00007ffff7da094b <+11>:    jb     0x7ffff7da0970 <__memmove_avx_unaligned_erms+48>
      => 0x00007ffff7da094d <+13>:    vmovdqu (%rsi),%ymm0
         0x00007ffff7da0951 <+17>:    cmp    $0x40,%rdx
         0x00007ffff7da0955 <+21>:    ja     0x7ffff7da0a00 <__memmove_avx_unaligned_erms+192>
         0x00007ffff7da095b <+27>:    vmovdqu -0x20(%rsi,%rdx,1),%ymm1
         0x00007ffff7da0961 <+33>:    vmovdqu %ymm0,(%rdi)
         0x00007ffff7da0965 <+37>:    vmovdqu %ymm1,-0x20(%rdi,%rdx,1)
         0x00007ffff7da096b <+43>:    vzeroupper
  2. NUMA node scenario

    1. Start QEMU with the above command line configuration.
    2. In the guest, attach CXL memory as a numa node 1.
    3. In the guest, compile and run the following program using numactl.
      #include <stdio.h>
      #include <stdlib.h>
      #include <string.h>
      #include <time.h>
      
      int main() {
          printf("hello world\n");
          return 0;
      }
      numactl --membind=1 ./helloworld
    4. SIGILL occurs before main.
      • This is an SSE instruction(movq) that moves 128-bit data from the XMM register to memory.
         0x00007ffff7fdc162 <+242>:   or     0x20a34(%rip),%eax        # 0x7ffff7ffcb9c <_rtld_global_ro+188>
         0x00007ffff7fdc168 <+248>:   and    $0x14810,%edx
         0x00007ffff7fdc16e <+254>:   and    $0x1c00,%ebx
         0x00007ffff7fdc174 <+260>:   mov    %esi,0x20a02(%rip)        # 0x7ffff7ffcb7c <_rtld_global_ro+156>
         0x00007ffff7fdc17a <+266>:   or     0x20a20(%rip),%edx        # 0x7ffff7ffcba0 <_rtld_global_ro+192>
      => 0x00007ffff7fdc180 <+272>:   movq   %xmm0,0x20a34(%rip)        # 0x7ffff7ffcbbc <_rtld_global_ro+220>
         0x00007ffff7fdc188 <+280>:   or     0x20aa6(%rip),%ebx        # 0x7ffff7ffcc34 <_rtld_global_ro+340>
         0x00007ffff7fdc18e <+286>:   mov    0x20ad4(%rip),%ebp        # 0x7ffff7ffcc68 <_rtld_global_ro+392>
         0x00007ffff7fdc194 <+292>:   mov    %ebx,0x20a9a(%rip)        # 0x7ffff7ffcc34 <_rtld_global_ro+340>
         0x00007ffff7fdc19a <+298>:   and    $0x10,%ebp

Additional information

Edited by Rocky Song