【throttle-group】qemu-kvm crashes with throttle enabled on aarch64
Host environment
- Operating system: centos9
- OS/kernel version: Linux cclinux2209-4444 5.15.67-11.cl9.aarch64
- Architecture:
- aarch64
- QEMU flavor:
- QEMU version:
virsh version
Compiled against library: libvirt 10.0.0
Using library: libvirt 10.0.0
Using API: QEMU 10.0.0
Running hypervisor: QEMU 8.2.0
/usr/libexec/qemu-kvm --version
QEMU emulator version 8.2.0 (qemu-kvm-8.2.0-1.cl9)
Copyright (c) 2003-2023 Fabrice Bellard and the QEMU Project developers
- QEMU command line:
./qemu-system-x86_64 -M q35 -m 4096 -enable-kvm -hda fedora32.qcow2
libvirt xml throttle-group-vm.xml
The virtual machine needs to be configured with multiple disks, and these disks should be grouped together under a shared I/O throttle configuration.
<disk type='block' device='disk'>
<driver name='qemu' type='raw' cache='none' io='native' discard='ignore' iothread='1'/>
<source dev='/var/run/kubevirt/hotplug-disks/ecs-nrqvtsnzxt3q1v-os-1' index='3'/>
<backingStore/>
<target dev='vda' bus='virtio'/>
<iotune>
<total_bytes_sec>104857600</total_bytes_sec>
<group_name>ecs-nrqvtsnzxt3q1v</group_name>
</iotune>
</disk>
<disk type='block' device='disk'>
<driver name='qemu' type='raw' cache='none' io='native' discard='ignore' iothread='2'/>
<source dev='/var/run/kubevirt/hotplug-disks/pvc-volume-a2058f6c-7cd0-4ab1-b42e-7f2c690ddb53' index='2'/>
<backingStore/>
<target dev='vdc' bus='virtio'/>
Emulated/Virtualized environment
- Operating system:
- OS/kernel version:
- Architecture: aarch64
Description of problem
Steps to reproduce
- enable throttle-groups for disk, virtual machine xml :
- start vm
- run fio for disks in guest
- qemu abort
The probability of this issue occurring is very low.
Additional information
qemu bt: (gdb) bt
#0 0x0000ffffa9964620 in __pthread_kill_implementation () at /lib64/libc.so.6
#1 0x0000ffffa991f78c in raise () at /lib64/libc.so.6
#2 0x0000ffffa9907030 in abort () at /lib64/libc.so.6
#3 0x0000ffffa9919300 in __assert_fail_base () at /lib64/libc.so.6
#4 0x0000ffffa9919370 in __assert_perror_fail () at /lib64/libc.so.6
#5 0x0000aaaae2e83824 in throttle_group_restart_queue (tgm=0xaaaafc53bca8, direction=THROTTLE_READ)
at ../block/throttle-groups.c:441
#6 0x0000aaaae2f51b70 in timerlist_run_timers (timer_list=0xaaaaf7bb4a00) at ../util/qemu-timer.c:576
#7 0x0000aaaae2f51c34 in timerlist_run_timers (timer_list=<optimized out>) at ../util/qemu-timer.c:509
#8 timerlistgroup_run_timers (tlg=tlg@entry=0xaaaaf7bb48e0) at ../util/qemu-timer.c:615
#9 0x0000aaaae2f36e40 in aio_poll (ctx=0xaaaaf7bb4720, blocking=blocking@entry=true) at ../util/aio-posix.c:729
#10 0x0000aaaae2e17b1c in iothread_run (opaque=0xaaaaf7a33880) at ../iothread.c:63
#11 0x0000aaaae2f39b94 in qemu_thread_start (args=0xaaaaf7bb6270) at ../util/qemu-thread-posix.c:541
#12 0x0000ffffa9962a08 in start_thread () at /lib64/libc.so.6
#13 0x0000ffffa990bb9c in thread_start () at /lib64/libc.so.6
code:
static void throttle_group_restart_queue(ThrottleGroupMember *tgm,
ThrottleDirection direction)
{
Coroutine *co;
RestartData *rd = g_new0(RestartData, 1);
rd->tgm = tgm;
rd->direction = direction;
/* This function is called when a timer is fired or when
* throttle_group_restart_tgm() is called. Either way, there can
* be no timer pending on this tgm at this point */
assert(!timer_pending(tgm->throttle_timers.timers[direction]));----trigger abort
qatomic_inc(&tgm->restart_pending);
co = qemu_coroutine_create(throttle_group_restart_queue_entry, rd);
aio_co_enter(tgm->aio_context, co);
}
441 assert(!timer_pending(tgm->throttle_timers.timers[direction]));
(gdb) p tgm->throttle_timers.timers[direction]
$1 = (QEMUTimer *) 0xfff75800ada0
(gdb) p direction
$2 = THROTTLE_READ
(gdb) p *tgm->throttle_timers.timers[direction]
$3 = {expire_time = 448624728890175, timer_list = 0xaaaaf7bb4a00, cb = 0xaaaae2e838b0 <read_timer_cb>,
opaque = 0xaaaafc53bca8, next = 0x0, attributes = 0, scale = 1}
(gdb)
bool timerlist_run_timers(QEMUTimerList *timer_list) {
......
/* remove timer from the list before calling the callback */
timer_list->active_timers = ts->next;
ts->next = NULL;
ts->expire_time = -1;
cb = ts->cb;
opaque = ts->opaque;
/* run the callback (the timer list can be modified) */
qemu_mutex_unlock(&timer_list->active_timers_lock);
cb(opaque); _**---- call read_timer_cb --- timer_cb **_
//
qemu_mutex_lock(&timer_list->active_timers_lock);
......
}
The function throttle_group_restart_queue is not protected by a lock, which means that during its execution, the function schedule_next_request might concurrently modify the tgm->throttle_timers.timers field. (gdb) p *tgm->throttle_timers.timers[direction] $3 = {expire_time = 448624728890175, timer_list = 0xaaaaf7bb4a00, cb = 0xaaaae2e838b0 <read_timer_cb>, opaque = 0xaaaafc53bca8, next = 0x0, attributes = 0, scale = 1}
The expire_time of tgm->throttle_timers.timers[direction] should originally be NULL, but at the time of the abort it is non-NULL, indicating that it was reassigned in between.
This issue occurs with extremely low probability and has only been observed on the aarch64 architecture.