Skip to content

pgrep performance improvement (and memory leak fix)

Tommi Rantala requested to merge tt.rantala/procps:pgrep-fixes into master

Import and use stzncpy() instead of strncpy() to avoid unneeded zeroing of the command line buffers, to improve pgrep performance.

Permission to use stzncpy() from Jim Meyering jim@meyering.net:

On Tue, Nov 10, 2020 at 10:39 PM <tommi.t.rantala@nokia.com> wrote:
> Hello Jim,
>
> May I use your stzncpy() routine in procps-ng?

Hi Tommi,

Glad to hear it. Yes, you may.
I suggest you use the copy in coreutils here:
https://git.savannah.gnu.org/cgit/coreutils.git/tree/src/system.h#n735

Comparison with perf stat --repeat=100 pgrep systemd:

Before:

 Performance counter stats for './pgrep systemd' (100 runs):

              9,35 msec task-clock                #    0,972 CPUs utilized            ( +-  0,83% )
                 3      context-switches          #    0,267 K/sec                    ( +-  2,93% )
                 0      cpu-migrations            #    0,000 K/sec
               172      page-faults               #    0,018 M/sec                    ( +-  0,06% )
        28 316 570      cycles                    #    3,028 GHz                      ( +-  0,66% )
        34 904 936      instructions              #    1,23  insn per cycle           ( +-  0,01% )
         6 890 972      branches                  #  736,807 M/sec                    ( +-  0,01% )
            42 027      branch-misses             #    0,61% of all branches          ( +-  0,51% )

         0,0096226 +- 0,0000819 seconds time elapsed  ( +-  0,85% )

After:

 Performance counter stats for './pgrep systemd' (100 runs):

              6,05 msec task-clock                #    0,955 CPUs utilized            ( +-  1,05% )
                 2      context-switches          #    0,274 K/sec                    ( +-  4,30% )
                 0      cpu-migrations            #    0,000 K/sec
               108      page-faults               #    0,018 M/sec                    ( +-  0,08% )
        17 947 927      cycles                    #    2,967 GHz                      ( +-  1,00% )
        28 428 691      instructions              #    1,58  insn per cycle           ( +-  0,01% )
         5 957 492      branches                  #  984,878 M/sec                    ( +-  0,01% )
            40 112      branch-misses             #    0,67% of all branches          ( +-  0,48% )

         0,0063315 +- 0,0000673 seconds time elapsed  ( +-  1,06% )
Edited by Tommi Rantala

Merge request reports