Skip to content

[RFC] Revert "sched: Add support for lazy preemption"

eiffel requested to merge eiffel/centos-stream-9:francis/fix-btf into main

Hi.

This merge request reverts a commit which had bad side effects on BTF generation. Moreover, I think the reverted commit is an RT one which was added to a non RT kernel, i.e.: kernel-5.14.0-362.el9.

Due to problem in BTF, some load in eBPF occurred at bad places and programs were rejected by the verifier:

root@vm-amd64:~# /rhel-bug-reproducer 
2024/02/07 07:46:01 opening tracepoint: cannot create bpf perf link: permission denied
root@vm-amd64:~# uname -a
Linux vm-amd64 5.14.0-dirty #10 SMP PREEMPT_DYNAMIC Wed Feb 7 14:41:42 +07 2024 x86_64 GNU/Linux

Indeed, if we compare the BTF of this RHEL kernel with another without this problem we can see that the added field added padding which causes troubles:

$ grep trace_event_raw_sys_exit -A 10 /tmp/good /tmp/bad                           (tags/v5.14^0) %
/tmp/good:struct trace_event_raw_sys_exit {
/tmp/good-      struct trace_entry         ent;                  /*     0     8 */
/tmp/good-      long int                   id;                   /*     8     8 */
/tmp/good-      long int                   ret;                  /*    16     8 */
/tmp/good-      char                       __data[];             /*    24     0 */
/tmp/good-
/tmp/good-      /* size: 24, cachelines: 1, members: 4 */
/tmp/good-      /* last cacheline: 24 bytes */
/tmp/good-};
/tmp/good-struct trace_event_data_offsets_sys_enter {
/tmp/good-
--
/tmp/bad:struct trace_event_raw_sys_exit {
/tmp/bad-       struct trace_entry         ent;                  /*     0    12 */
/tmp/bad-
/tmp/bad-       /* XXX last struct has 3 bytes of padding */
/tmp/bad-       /* XXX 4 bytes hole, try to pack */
/tmp/bad-
/tmp/bad-       long int                   id;                   /*    16     8 */
/tmp/bad-       long int                   ret;                  /*    24     8 */
/tmp/bad-       char                       __data[];             /*    32     0 */
/tmp/bad-
/tmp/bad-       /* size: 32, cachelines: 1, members: 4 */

Indeed, the RT patch adds a field in trace_entry: preempt_lazy_count. As a consequence, the verifier rejects the program here: https://gitlab.com/redhat/centos-stream/src/kernel/centos-stream-9/-/blob/kernel-5.14.0-362.el9/kernel/trace/trace_events.c#L209 Because the program max_offset was set to 32 instead of 24 on kernel without this problem.

This was reported on Inspektor Gadget repository by https://github.com/matthyx and first investigated by https://github.com/mauriciovasquezbernal: https://github.com/inspektor-gadget/inspektor-gadget/issues/2444

With the patch reverted, we can now load eBPF programs:

root@vm-amd64:~# /rhel-bug-reproducer 
Run sudo cat /sys/kernel/debug/tracing/trace_pipe in another terminal to see the output
Press Ctrl+C to close: root@vm-amd64:~#

Of course, the proposed solution may be too brutal if you really want to have this patch. But, with the problem understood, we can start to discuss about possible solution.

Best regards and thank you in advance.

Edited by eiffel

Merge request reports

Loading