CPU hotplug crashed the guest when using virt-type as qemu!

Host environment

  • Operating system: Fedora41
  • OS/kernel version: 6.11.7-300.fc41.ppc64le
  • Architecture: ppc64
  • QEMU flavor: qemu-system-ppc64
  • QEMU version: QEMU emulator version 9.1.1 (qemu-9.1.1-1.fc41)

libvirt xml snippet:

<domain type='qemu'>
  <name>linux</name>
  <uuid>cba9037f-2a62-41f9-98c1-0780b2ff49b9</uuid>
  <maxMemory slots='16' unit='KiB'>419430400</maxMemory>
  <memory unit='KiB'>20971520</memory>
  <currentMemory unit='KiB'>10485760</currentMemory>
  <memoryBacking>
    <locked/>
  </memoryBacking>
  <vcpu placement='static' current='4'>1024</vcpu>

Emulated/Virtualized environment

  • Operating system: Fedora41
  • OS/kernel version: 6.11.7-300.fc41.ppc64le
  • Architecture: ppc64

Description of problem

Guest is getting crashing and getting into shutoff state when I am trying to hotplug much more cpus than present in the host! This is happening only when i give virt-type as qemu.

Steps to reproduce:

  1. Start a guest with virt-type as qemu
<domain type='qemu'>
  <name>linux</name>
  <uuid>cba9037f-2a62-41f9-98c1-0780b2ff49b9</uuid>
  <maxMemory slots='16' unit='KiB'>419430400</maxMemory>
  <memory unit='KiB'>20971520</memory>
  <currentMemory unit='KiB'>10485760</currentMemory>
  <memoryBacking>
    <locked/>
  </memoryBacking>
  <vcpu placement='static' current='4'>1024</vcpu>
  1. lscpu on host:
lscpu
Architecture:             ppc64le
  Byte Order:             Little Endian
CPU(s):                   40
  On-line CPU(s) list:    0-39
Model name:               POWER10 (architected), altivec supported
  Model:                  2.0 (pvr 0080 0200)
  Thread(s) per core:     8
  Core(s) per socket:     5
  Socket(s):              1
  Physical sockets:       4
  Physical chips:         1
  Physical cores/chip:    12
  1. [On host] virsh setvcpus linux 800 error: Unable to read from monitor: Connection reset by peer

  2. Guest is getting into shutoff state

  3. Issue is seen with upstream qemu also.

Additional information

Tried reproducing while attaching gdb shows below backtrace which happened after hotplugging 249 CPUs in TCG mode:

Thread 261 "CPU 249/TCG" received signal SIGABRT, Aborted.
[Switching to Thread 0x7ff97c00ea20 (LWP 51567)]
0x00007fff84cac3e8 in __pthread_kill_implementation () from target:/lib64/glibc-hwcaps/power10/libc.so.6
(gdb) bt
#0  0x00007fff84cac3e8 in __pthread_kill_implementation () from target:/lib64/glibc-hwcaps/power10/libc.so.6
#1  0x00007fff84c46ba0 in raise () from target:/lib64/glibc-hwcaps/power10/libc.so.6
#2  0x00007fff84c29354 in abort () from target:/lib64/glibc-hwcaps/power10/libc.so.6
#3  0x00007fff850f1e30 in g_assertion_message () from target:/lib64/libglib-2.0.so.0
#4  0x00007fff850f1ebc in g_assertion_message_expr () from target:/lib64/libglib-2.0.so.0
#5  0x00000001376c6f00 in tcg_region_initial_alloc__locked (s=0x7fff7c000f30) at ../tcg/region.c:396
#6  0x00000001376c6fa8 in tcg_region_initial_alloc (s=0x7fff7c000f30) at ../tcg/region.c:402
#7  0x00000001376dae08 in tcg_register_thread () at ../tcg/tcg.c:1011
#8  0x000000013768b7e4 in mttcg_cpu_thread_fn (arg=0x143e884f0) at ../accel/tcg/tcg-accel-ops-mttcg.c:77
#9  0x0000000137bbb2d0 in qemu_thread_start (args=0x143b4aff0) at ../util/qemu-thread-posix.c:542
#10 0x00007fff84ca9be0 in start_thread () from target:/lib64/glibc-hwcaps/power10/libc.so.6
#11 0x00007fff84d4ef3c in __clone3 () from target:/lib64/glibc-hwcaps/power10/libc.so.6
(gdb) 

which points to below code:

/*
 * Perform a context's first region allocation.
 * This function does _not_ increment region.agg_size_full.
 */
static void tcg_region_initial_alloc__locked(TCGContext *s)
{
    bool err = tcg_region_alloc__locked(s);
    g_assert(!err);
}

Here, tcg_region_alloc__locked returns true on failure when max region allocation is reached and therefore intentionally asserted as TCG can't proceed without it.

static bool tcg_region_alloc__locked(TCGContext *s)
{
    if (region.current == region.n) {
        return true;
    }
    tcg_region_assign(s, region.current);
    region.current++;
    return false;
}
Edited by Anushree Mathur