Kernel OOPS: take_other_rq_tasks + futex_wait_multiple
For quite some time now, I've been experiencing system freezes while using Project-C / PDS as the scheduler for Liquorix when playing games. It takes a while for it to happen, usually 30 minutes to an hour. After the freeze happens, the kernel logs from the previous day don't contain any information since the system hard locks.
Today while playing a game (Voice Of Cards in this case), through Steam + Proton, I got a particular freeze that left the sound playing. I forced an emergency sync with SysRQ + ALT + S and got this in the kernel logs:
general protection fault, probably for non-canonical address 0xdead0000000000f8: 0000 [#1] PREEMPT SMP NOPTI
CPU: 1 PID: 12361 Comm: VoiceofCardsThe Tainted: G O 6.1.0-lqx1-1-lqx #1
Hardware name: ASUS System Product Name/Pro WS X570-ACE, BIOS 4201 04/26/2022
RIP: 0010:take_other_rq_tasks+0x162/0x6f0
Code: 63 43 70 48 8b 73 78 48 89 c2 48 c1 e0 04 4c 8d 66 88 48 8d 44 05 48 48 39 c6 0f 84 c8 01 00 00 4c 39 65 10 0f 84 8d 01 00 00 <49> 63 44 24 70 49 8b 74 24 78 48 89 c2 48 c1 e0 04 48 8d 5e 88 48
RSP: 0018:ffffc9000680fcc8 EFLAGS: 00010016
RAX: ffff888feeeb0bc8 RBX: ffff888100e2ea00 RCX: 0000000000000000
RDX: 0000000000000000 RSI: dead000000000100 RDI: 0000000000000000
RBP: ffff888feeeb0b80 R08: 000000000000001a R09: 0000000000000000
R10: ffff888fee8601b0 R11: ffff888feeeb0bb0 R12: dead000000000088
R13: ffff888fee870b80 R14: 0000000000000001 R15: 0000000000000002
FS: 00000000696df6c0(0000) GS:ffff888fee840000(0000) knlGS:000000007fe40000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00007f9bc2640208 CR3: 00000001bb59c000 CR4: 0000000000750ee0
PKRU: 55555554
Call Trace:
<TASK>
__schedule+0x8d5/0xd40
? get_futex_key+0x46b/0x590
schedule+0x57/0xb0
futex_wait_multiple+0x38f/0x430
__do_sys_futex_waitv+0x2e9/0x350
do_syscall_64+0x37/0xc0
entry_SYSCALL_64_after_hwframe+0x63/0xcd
RIP: 0033:0x7fed0f41d7fd
Code: 5d c3 66 2e 0f 1f 84 00 00 00 00 00 90 f3 0f 1e fa 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d 5b 75 0d 00 f7 d8 64 89 01 48
RSP: 002b:00000000696dd938 EFLAGS: 00000246 ORIG_RAX: 00000000000001c1
RAX: ffffffffffffffda RBX: 00007fed0d9f9880 RCX: 00007fed0f41d7fd
RDX: 0000000000000000 RSI: 0000000000000001 RDI: 00000000696ddd90
RBP: 0000000000000001 R08: 0000000000000000 R09: 00000001696de260
R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000001
R13: 00000000696dd97c R14: 0000000000000000 R15: 000000006967fc30
</TASK>
Modules linked in: af_packet rfcomm cmac algif_hash algif_skcipher af_alg bnep btusb btrtl btbcm btintel btmtk bluetooth ecdh_generic nf_conntrack_netlink xfrm_user xfrm_algo xt_addrtype br_netfilter xt_CHECKSUM ipt_REJECT nf_reject_ipv4 xt_tcpudp snd_seq_dummy snd_hrtimer snd_seq overlay bridge stp llc tun hid_logitech_hidpp qrtr mousedev xt_conntrack nft_chain_nat xt_MASQUERADE nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 nft_compat nf_tables vmnet(O) nct6775 nfnetlink nct6775_core hwmon_vid hid_logitech_dj snd_usb_audio snd_usbmidi_lib snd_rawmidi snd_seq_device mc joydev xpad ff_memless usbhid zram vmmon(O) vmw_vmci vboxnetflt(O) vboxnetadp(O) vboxdrv(O) snd_hda_codec_realtek snd_hda_codec_generic intel_rapl_msr intel_rapl_common edac_mce_amd snd_hda_codec_hdmi kvm_amd snd_hda_intel snd_intel_dspcfg snd_intel_sdw_acpi kvm asus_ec_sensors snd_hda_codec irqbypass snd_hda_core crct10dif_pclmul crc32_pclmul polyval_clmulni snd_hwdep polyval_generic ghash_clmulni_intel
eeepc_wmi sha512_ssse3 asus_wmi snd_pcm aesni_intel ledtrig_audio sparse_keymap crypto_simd snd_timer platform_profile cryptd rfkill snd wmi_bmof mxm_wmi pcspkr ccp soundcore sp5100_tco k10temp igb i2c_piix4 r8169 dca ipmi_devintf tpm_crb ipmi_msghandler tpm_tis realtek tpm_tis_core tpm rng_core acpi_cpufreq usbip_host usbip_core pkcs8_key_parser i2c_dev dm_multipath dm_mod sg fuse crypto_user dmi_sysfs ip_tables x_tables xhci_pci xhci_pci_renesas amdgpu drm_ttm_helper ttm agpgart video wmi gpu_sched i2c_algo_bit drm_buddy drm_display_helper drm_kms_helper syscopyarea sysfillrect sysimgblt fb_sys_fops drm cec rc_core btrfs blake2b_generic xor raid6_pq ext4 crc16 mbcache jbd2 xfs libcrc32c crc32c_generic crc32c_intel
---[ end trace 0000000000000000 ]---
RIP: 0010:take_other_rq_tasks+0x162/0x6f0
Code: 63 43 70 48 8b 73 78 48 89 c2 48 c1 e0 04 4c 8d 66 88 48 8d 44 05 48 48 39 c6 0f 84 c8 01 00 00 4c 39 65 10 0f 84 8d 01 00 00 <49> 63 44 24 70 49 8b 74 24 78 48 89 c2 48 c1 e0 04 48 8d 5e 88 48
RSP: 0018:ffffc9000680fcc8 EFLAGS: 00010016
RAX: ffff888feeeb0bc8 RBX: ffff888100e2ea00 RCX: 0000000000000000
RDX: 0000000000000000 RSI: dead000000000100 RDI: 0000000000000000
RBP: ffff888feeeb0b80 R08: 000000000000001a R09: 0000000000000000
R10: ffff888fee8601b0 R11: ffff888feeeb0bb0 R12: dead000000000088
R13: ffff888fee870b80 R14: 0000000000000001 R15: 0000000000000002
FS: 00000000696df6c0(0000) GS:ffff888fee840000(0000) knlGS:000000007fe40000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00007f9bc2640208 CR3: 00000001bb59c000 CR4: 0000000000750ee0
PKRU: 55555554
note: VoiceofCardsThe[12361] exited with preempt_count 2
[drm] Fence fallback timer expired on ring gfx_0.0.0
[drm] Fence fallback timer expired on ring gfx_0.0.0
[drm] Fence fallback timer expired on ring gfx_0.0.0
[drm] Fence fallback timer expired on ring gfx_0.0.0
[drm] Fence fallback timer expired on ring gfx_0.0.0
[drm] Fence fallback timer expired on ring gfx_0.0.0
[drm] Fence fallback timer expired on ring gfx_0.0.0
[drm] Fence fallback timer expired on ring gfx_0.0.0
[drm] Fence fallback timer expired on ring sdma0
[drm] Fence fallback timer expired on ring gfx_0.0.0
[drm] Fence fallback timer expired on ring sdma0
[drm] Fence fallback timer expired on ring gfx_0.0.0
[drm] Fence fallback timer expired on ring sdma0
[drm] Fence fallback timer expired on ring gfx_0.0.0
[drm] Fence fallback timer expired on ring gfx_0.0.0
[drm] Fence fallback timer expired on ring sdma0
[drm] Fence fallback timer expired on ring gfx_0.0.0
[drm] Fence fallback timer expired on ring gfx_0.0.0
[drm] Fence fallback timer expired on ring sdma0
[drm] Fence fallback timer expired on ring gfx_0.0.0
sysrq: Emergency Sync
Emergency Sync complete
amdgpu 0000:0c:00.0: [drm] *ERROR* [CRTC:77:crtc-0] flip_done timed out
sysrq: Emergency Remount R/O
Although this could be a bug with futex_wait_multiple, what's I've reported before on a separate thread [1] is the appearance of take_other_rq_tasks.
Let me know what else you need from me to help reproduce this. Any game through Stream that uses FSYNC should trigger this issue above. The difficult part is when the crash or hard lock occurs, typically the kernel also cannot log the problem. So although it's easy to reproduce, getting logs takes hard locking maybe 10-20 times before the race condition (probably?) allows for write out through IO.
Reference information:
- AUR link: https://aur.archlinux.org/packages/linux-lqx
- AUR commit: https://aur.archlinux.org/cgit/aur.git/commit/?h=linux-lqx&id=0b7a9000ebf340e992654c1a01177c833b345965
- Kernel source: https://github.com/zen-kernel/zen-kernel/releases/tag/v6.1.0-lqx1
- Kernel package/scripts: https://github.com/damentz/liquorix-package/releases/tag/6.1-1