Skip to content

ppc64 POWER10 machine-check caused by ifetch crashes qemu

POWER9/10 will generate a machine check on an invalid real address access, which can be easily irritated by running in real-mode (I have a kvm-unit-tests test for it but that's not upstream yet).

POWER10 additionally has prefix instructions where synchronous interrupts generally set the SRR1[PREFIX] bit if the instruction is a prefix one. That is implemented in QEMU by loading the instruction image in the exception generation code. The problem is if an instruction fetch caused the fault, then the exception code must not try to load it again otherwise it will get a recursive fault and crash. Machine check caused by ifetch to invalid real address crashes like this:

  ERROR:../system/cpus.c:504:qemu_mutex_lock_iothread_impl:
      assertion failed: (!qemu_mutex_iothread_locked())
  #0  __pthread_kill_implementation
      (threadid=<optimized out>, signo=signo@entry=6, no_tid=no_tid@entry=0)
      at ./nptl/pthread_kill.c:44
  #1  0x00007ffff705a15f in __pthread_kill_internal
      (signo=6, threadid=<optimized out>) at ./nptl/pthread_kill.c:78
  #2  0x00007ffff700c472 in __GI_raise (sig=sig@entry=6)
      at ../sysdeps/posix/raise.c:26
  #3  0x00007ffff6ff64b2 in __GI_abort () at ./stdlib/abort.c:79
  #4  0x00007ffff73def08 in  () at /lib/x86_64-linux-gnu/libglib-2.0.so.0
  #5  0x00007ffff7445e4e in g_assertion_message_expr ()
      at /lib/x86_64-linux-gnu/libglib-2.0.so.0
  #6  0x0000555555a833f1 in qemu_mutex_lock_iothread_impl
      (file=0x555555efda6e "../accel/tcg/cputlb.c", line=2033)
      at ../system/cpus.c:504
  #7  qemu_mutex_lock_iothread_impl
      (file=file@entry=0x555555efda6e "../accel/tcg/cputlb.c", line=line@en>
  #8  0x0000555555cbf786 in do_ld_mmio_beN
      (cpu=cpu@entry=0x555556b72010, full=0x7fff5408e010, ret_be=ret_be@ent>
  #9  0x0000555555cc2ec6 in do_ld_4
      (ra=0, memop=MO_BEUL, type=MMU_INST_FETCH, mmu_idx=<optimized out>, p>
  #10 do_ld4_mmu
      (cpu=cpu@entry=0x555556b72010, addr=<optimized out>, oi=<optimized ou>
      at ../accel/tcg/cputlb.c:2418
  #11 0x0000555555ccbaf6 in cpu_ldl_code
      (env=env@entry=0x555556b747d0, addr=<optimized out>)
      at ../accel/tcg/cputlb.c:2975
  #12 0x0000555555b7a47c in ppc_ldl_code
      (addr=<optimized out>, env=0x555556b747d0)
      at ../target/ppc/excp_helper.c:147
  #13 is_prefix_insn_excp (excp=1, cpu=0x555556b72010)
      at ../target/ppc/excp_helper.c:1350
  #14 powerpc_excp_books (excp=1, cpu=0x555556b72010)
      at ../target/ppc/excp_helper.c:1415
  #15 powerpc_excp (cpu=0x555556b72010, excp=<optimized out>)
      at ../target/ppc/excp_helper.c:1733
  #16 0x0000555555cb1c74 in cpu_handle_exception
      (ret=<synthetic pointer>, cpu=<optimized out>)

This should be fixed by not trying to set the SRR1[PREFIX] indication if the machine check is caused by ifetch.

To upload designs, you'll need to enable LFS and have an admin enable hashed storage. More information