【throttle-group】qemu-kvm crashes with throttle enabled on aarch64

Host environment

  • Operating system: centos9
  • OS/kernel version: Linux cclinux2209-4444 5.15.67-11.cl9.aarch64
  • Architecture:
  • aarch64
  • QEMU flavor:
  • QEMU version:
virsh version
Compiled against library: libvirt 10.0.0
Using library: libvirt 10.0.0
Using API: QEMU 10.0.0
Running hypervisor: QEMU 8.2.0
 /usr/libexec/qemu-kvm  --version
QEMU emulator version 8.2.0 (qemu-kvm-8.2.0-1.cl9)
Copyright (c) 2003-2023 Fabrice Bellard and the QEMU Project developers
  • QEMU command line:
    ./qemu-system-x86_64 -M q35 -m 4096 -enable-kvm -hda fedora32.qcow2

libvirt xml throttle-group-vm.xml

The virtual machine needs to be configured with multiple disks, and these disks should be grouped together under a shared I/O throttle configuration.

<disk type='block' device='disk'>
  <driver name='qemu' type='raw' cache='none' io='native' discard='ignore' iothread='1'/>
  <source dev='/var/run/kubevirt/hotplug-disks/ecs-nrqvtsnzxt3q1v-os-1' index='3'/>
  <backingStore/>
  <target dev='vda' bus='virtio'/>
  <iotune>
    <total_bytes_sec>104857600</total_bytes_sec>
    <group_name>ecs-nrqvtsnzxt3q1v</group_name>
  </iotune>
 </disk>
<disk type='block' device='disk'>
  <driver name='qemu' type='raw' cache='none' io='native' discard='ignore' iothread='2'/>
  <source dev='/var/run/kubevirt/hotplug-disks/pvc-volume-a2058f6c-7cd0-4ab1-b42e-7f2c690ddb53' index='2'/>
  <backingStore/>
  <target dev='vdc' bus='virtio'/>
104857600 ecs-nrqvtsnzxt3q1v

Emulated/Virtualized environment

  • Operating system:
  • OS/kernel version:
  • Architecture: aarch64

Description of problem

Steps to reproduce

  1. enable throttle-groups for disk, virtual machine xml :
  2. start vm
  3. run fio for disks in guest
  4. qemu abort

The probability of this issue occurring is very low.

Additional information

qemu bt: (gdb) bt

#0  0x0000ffffa9964620 in __pthread_kill_implementation () at /lib64/libc.so.6
#1  0x0000ffffa991f78c in raise () at /lib64/libc.so.6
#2  0x0000ffffa9907030 in abort () at /lib64/libc.so.6
#3  0x0000ffffa9919300 in __assert_fail_base () at /lib64/libc.so.6
#4  0x0000ffffa9919370 in __assert_perror_fail () at /lib64/libc.so.6
#5  0x0000aaaae2e83824 in throttle_group_restart_queue (tgm=0xaaaafc53bca8, direction=THROTTLE_READ)
    at ../block/throttle-groups.c:441
#6  0x0000aaaae2f51b70 in timerlist_run_timers (timer_list=0xaaaaf7bb4a00) at ../util/qemu-timer.c:576
#7  0x0000aaaae2f51c34 in timerlist_run_timers (timer_list=<optimized out>) at ../util/qemu-timer.c:509
#8  timerlistgroup_run_timers (tlg=tlg@entry=0xaaaaf7bb48e0) at ../util/qemu-timer.c:615
#9  0x0000aaaae2f36e40 in aio_poll (ctx=0xaaaaf7bb4720, blocking=blocking@entry=true) at ../util/aio-posix.c:729
#10 0x0000aaaae2e17b1c in iothread_run (opaque=0xaaaaf7a33880) at ../iothread.c:63
#11 0x0000aaaae2f39b94 in qemu_thread_start (args=0xaaaaf7bb6270) at ../util/qemu-thread-posix.c:541
#12 0x0000ffffa9962a08 in start_thread () at /lib64/libc.so.6
#13 0x0000ffffa990bb9c in thread_start () at /lib64/libc.so.6

code:

static void throttle_group_restart_queue(ThrottleGroupMember *tgm,
                                        ThrottleDirection direction)
{
    Coroutine *co;
    RestartData *rd = g_new0(RestartData, 1);

    rd->tgm = tgm;
    rd->direction = direction;

    /* This function is called when a timer is fired or when
     * throttle_group_restart_tgm() is called. Either way, there can
     * be no timer pending on this tgm at this point */
    assert(!timer_pending(tgm->throttle_timers.timers[direction]));----trigger abort

    qatomic_inc(&tgm->restart_pending);

    co = qemu_coroutine_create(throttle_group_restart_queue_entry, rd);
    aio_co_enter(tgm->aio_context, co);
}
441         assert(!timer_pending(tgm->throttle_timers.timers[direction]));
(gdb) p tgm->throttle_timers.timers[direction]
$1 = (QEMUTimer *) 0xfff75800ada0
(gdb) p direction
$2 = THROTTLE_READ
(gdb) p *tgm->throttle_timers.timers[direction]
$3 = {expire_time = 448624728890175, timer_list = 0xaaaaf7bb4a00, cb = 0xaaaae2e838b0 <read_timer_cb>,
  opaque = 0xaaaafc53bca8, next = 0x0, attributes = 0, scale = 1}
(gdb)
bool timerlist_run_timers(QEMUTimerList *timer_list) {

......
       /* remove timer from the list before calling the callback */
        timer_list->active_timers = ts->next;
        ts->next = NULL;
        ts->expire_time = -1;
        cb = ts->cb;
        opaque = ts->opaque;

        /* run the callback (the timer list can be modified) */
        qemu_mutex_unlock(&timer_list->active_timers_lock);
        cb(opaque); _**---- call read_timer_cb --- timer_cb **_
// 
        qemu_mutex_lock(&timer_list->active_timers_lock);
 
......
}

The function throttle_group_restart_queue is not protected by a lock, which means that during its execution, the function schedule_next_request might concurrently modify the tgm->throttle_timers.timers field. (gdb) p *tgm->throttle_timers.timers[direction] $3 = {expire_time = 448624728890175, timer_list = 0xaaaaf7bb4a00, cb = 0xaaaae2e838b0 <read_timer_cb>, opaque = 0xaaaafc53bca8, next = 0x0, attributes = 0, scale = 1}

The expire_time of tgm->throttle_timers.timers[direction] should originally be NULL, but at the time of the abort it is non-NULL, indicating that it was reassigned in between.

This issue occurs with extremely low probability and has only been observed on the aarch64 architecture.

Edited by grass-lu
To upload designs, you'll need to enable LFS and have an admin enable hashed storage. More information