Skip to content

ps 4.0.2 segfault printing ID_RUSER

In some background automation tasks, I've recently seen a handful of segfaults from ps. Here is an example:

$ ps -w -w -A -o ruser=user -o pid=pid -o ppid=ppid -o pgid=pgrp -o tpgid=tpgid -o nice=nice -o start_time=start -o vsz=size -o rss=rss -o state=state -o etime=etime -o time=time -o %cpu=pctcpu -o command=command
   ...
   root      346491       1  346491      -1   10 18:01 4539004 126648 S         00:12 00:00:03   28.8 /usr/lib/x86_64-linux-gnu/libexec/drkonqi-coredump-processor d26d01f4fb8e46cfb67b0ce0acb53f2a 12060-346483-0
   root      346520       2       0      -1    0 18:01      0     0 I           00:09 00:00:00    0.0 [kworker/3:0-events]
   root      346576       1  346576      -1   10 18:01 4542120 127596 S         00:08 00:00:04   52.1 /usr/lib/x86_64-linux-gnu/libexec/drkonqi-coredump-processor d26d01f4fb8e46cfb67b0ce0acb53f2a 12061-346574-0
   achutina  346599  257911  346599      -1    0 18:01  96596 13596 S           00:05 00:00:00    0.7 /usr/bin/pulseaudio --daemonize=no --log-target=journal
   root      346643       1  346643      -1    9 18:01 492400 133204Signal 11 (SEGV) caught by ps (4.0.2).
    R          00:04 00:00:04   99.5 (coredump)
   root      346644       1  346644      -1   10 18:01 4542120 127364 S         00:04 00:00:03   84.6 /usr/lib/x86_64-linux-gnu/libexec/drkonqi-coredump-processor d26d01f4fb8e46cfb67b0ce0acb53f2a 12062-346642-0
   jaadm     346687  344322  344322      -1    0 18:01  94648 22092 S           00:01 00:00:00    0.5 host fshome04ah
   jaadm     346708  344133  346708      -1    0 18:01  97440 15200 S           00:00 00:00:00    5.8 /usr/bin/pulseaudio --daemonize=no --log-target=journal
   achutina  346766  258027  258026      -1    0 18:01  17328  4440 R           00:00 00:00:00    300 ps -w -w -A -o ruser=user -o pid=pid -o ppid=ppid -o pgid=pgrp -o tpgid=tpgid -o nice=nice -o start_time=start -o vsz=size -o rss=rss -o state=state -o etime=etime -o time=time -o %cpu=pctcpu -o command=command
   ps:src/ps/display.c:71: please report this bug

This is on Debian 12 (current stable), which currently has procps 4.0.2.

I cannot reproduce this on demand, but there have been a few instances in my background automation (a distributed testing tool) which have generated core dumps on this machine. It wouldn't surprise me if there is some intermittent machine-specific issue that is causing unexpected data, but regardless of the cause it seems like procps isn't handling this condition.

In the two core dumps I've looked at, the stack trace is identical:

(gdb) bt
#0  0x00007fcde54b1267 in __GI_kill () at ../sysdeps/unix/syscall-template.S:120
#1  0x00005565a5c52e3b in signal_handler (signo=11) at src/ps/display.c:76
#2  <signal handler called>
#3  escape_str (dst=dst@entry=0x7fcde4fa1090 "", src=0x1 <error: Cannot access memory at address 0x1>, bufsize=bufsize@entry=131072,
    maxcells=maxcells@entry=0x7ffcfea95ae4) at src/ps/output.c:245
#4  0x00005565a5c593ec in do_pr_name (outbuf=0x7fcde4fa1090 "", name=<optimized out>, u=0) at src/ps/output.c:1206
#5  0x00005565a5c5b2e1 in show_one_proc (p=p@entry=0x7fcde4fc6800, fmt=0x5565ba359700) at src/ps/output.c:2205
#6  0x00005565a5c52904 in simple_spew () at src/ps/display.c:320
#7  main (argc=<optimized out>, argv=<optimized out>) at src/ps/display.c:672

(gdb) fr 5
#5  0x00005565a5c5b2e1 in show_one_proc (p=p@entry=0x7fcde4fc6800, fmt=0x5565ba359700) at src/ps/output.c:2205
2205        if(p && fmt->pr) amount = (*fmt->pr)(outbuf,p);
(gdb) p fmt->pr
$16 = (int (*)(char * const restrict, const struct pids_stack * const restrict)) 0x5565a5c59b60 <pr_ruser>

I believe that since $fmt->pr points to pr_ruser, that it is trying to print the real username.

Given the u=0 in the do_pr_name args, I would also guess the process it's trying to print is owned by root, but maybe that is a red herring (e.g. maybe both the user id and the user name are just corrupt).

Is there anything specific that I can look at in the core dump to help diagnose this? Unfortunately I can't reproduce this on demand, so I can't debug a running ps.

Also, at first I thought this bug might be https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=1036631, which is still present in Debian 12, but the stack trace is completely different than the one for that bug (and I'm not using the -m switch).

Edited by Mike Gulick