1. 05 Nov, 2018 1 commit
    • Andi Kleen's avatar
      perf record: Support weak groups · cf99ad14
      Andi Kleen authored
      Implement a weak group fallback for 'perf record', similar to the
      existing 'perf stat' support.  This allows to use groups that might be
      longer than the available counters without failing.
      
      Before:
      
        $ perf record  -e '{cycles,cache-misses,cache-references,cpu_clk_unhalted.thread,cycles,cycles,cycles}' -a sleep 1
        Error:
        The sys_perf_event_open() syscall returned with 22 (Invalid argument) for event (cycles).
        /bin/dmesg | grep -i perf may provide additional information.
      
      After:
      
        $ ./perf record  -e '{cycles,cache-misses,cache-references,cpu_clk_unhalted.thread,cycles,cycles,cycles}:W' -a sleep 1
        WARNING: No sample_id_all support, falling back to unordered processing
        [ perf record: Woken up 3 times to write data ]
        [ perf record: Captured and wrote 8.136 MB perf.data (134069 samples) ]
      Signed-off-by: default avatarAndi Kleen <ak@linux.intel.com>
      Acked-by: default avatarJiri Olsa <jolsa@kernel.org>
      Link: http://lkml.kernel.org/r/20181001195927.14211-2-andi@firstfloor.orgSigned-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      cf99ad14
  2. 31 Oct, 2018 1 commit
  3. 30 Oct, 2018 1 commit
    • Arnaldo Carvalho de Melo's avatar
      perf top: Allow disabling the overwrite mode · 4e303fbe
      Arnaldo Carvalho de Melo authored
      In ebebbf08 ("perf top: Switch default mode to overwrite mode") we
      forgot to leave a way to disable that new default, add a --overwrite
      option that can be disabled using --no-overwrite, since the code already
      in such a way that we can readily disable this mode.
      
      This is useful when investigating bugs with this mode like the recent
      report from David Miller where lots of unknown symbols appear due to
      disabling the events while processing them which disables all record
      types, not just PERF_RECORD_SAMPLE, which makes it impossible to resolve
      maps when we lose PERF_RECORD_MMAP records.
      
      This can be easily seen while building a kernel, when there are lots of
      short lived processes.
      Reported-by: default avatarDavid Miller <davem@davemloft.net>
      Acked-by: default avatarKan Liang <kan.liang@intel.com>
      Cc: Adrian Hunter <adrian.hunter@intel.com>
      Cc: Andi Kleen <ak@linux.intel.com>
      Cc: David Ahern <dsahern@gmail.com>
      Cc: Jin Yao <yao.jin@linux.intel.com>
      Cc: Jiri Olsa <jolsa@kernel.org>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Wang Nan <wangnan0@huawei.com>
      Fixes: ebebbf08 ("perf top: Switch default mode to overwrite mode")
      Link: https://lkml.kernel.org/n/tip-oqgsz2bq4kgrnnajrafcdhie@git.kernel.orgSigned-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      4e303fbe
  4. 24 Oct, 2018 4 commits
    • Andi Kleen's avatar
      perf script: Implement --graph-function · 99f753f0
      Andi Kleen authored
      Add a ftrace style --graph-function argument to 'perf script' that
      allows to print itrace function calls only below a given function. This
      makes it easier to find the code of interest in a large trace.
      
      % perf record -e intel_pt//k -a sleep 1
      % perf script --graph-function group_sched_in --call-trace
                  perf   900 [000] 194167.205652203: ([kernel.kallsyms])          group_sched_in
                  perf   900 [000] 194167.205652203: ([kernel.kallsyms])              __x86_indirect_thunk_rax
                  perf   900 [000] 194167.205652203: ([kernel.kallsyms])              event_sched_in.isra.107
                  perf   900 [000] 194167.205652203: ([kernel.kallsyms])                  perf_event_set_state.part.71
                  perf   900 [000] 194167.205652203: ([kernel.kallsyms])                      perf_event_update_time
                  perf   900 [000] 194167.205652203: ([kernel.kallsyms])                  perf_pmu_disable
                  perf   900 [000] 194167.205652203: ([kernel.kallsyms])                  perf_log_itrace_start
                  perf   900 [000] 194167.205652203: ([kernel.kallsyms])                  __x86_indirect_thunk_rax
                  perf   900 [000] 194167.205652203: ([kernel.kallsyms])                      perf_event_update_userpage
                  perf   900 [000] 194167.205652203: ([kernel.kallsyms])                          calc_timer_values
                  perf   900 [000] 194167.205652203: ([kernel.kallsyms])                              sched_clock_cpu
                  perf   900 [000] 194167.205652203: ([kernel.kallsyms])                          __x86_indirect_thunk_rax
                  perf   900 [000] 194167.205652203: ([kernel.kallsyms])                          arch_perf_update_userpage
                  perf   900 [000] 194167.205652203: ([kernel.kallsyms])                              __fentry__
                  perf   900 [000] 194167.205652203: ([kernel.kallsyms])                              using_native_sched_clock
                  perf   900 [000] 194167.205652203: ([kernel.kallsyms])                              sched_clock_stable
                  perf   900 [000] 194167.205652203: ([kernel.kallsyms])                  perf_pmu_enable
                  perf   900 [000] 194167.205652203: ([kernel.kallsyms])              __x86_indirect_thunk_rax
               swapper     0 [001] 194167.205660693: ([kernel.kallsyms])          group_sched_in
               swapper     0 [001] 194167.205660693: ([kernel.kallsyms])              __x86_indirect_thunk_rax
               swapper     0 [001] 194167.205660693: ([kernel.kallsyms])              event_sched_in.isra.107
               swapper     0 [001] 194167.205660693: ([kernel.kallsyms])                  perf_event_set_state.part.71
               swapper     0 [001] 194167.205660693: ([kernel.kallsyms])                      perf_event_update_time
               swapper     0 [001] 194167.205660693: ([kernel.kallsyms])                  perf_pmu_disable
               swapper     0 [001] 194167.205660693: ([kernel.kallsyms])                  perf_log_itrace_start
               swapper     0 [001] 194167.205660693: ([kernel.kallsyms])                  __x86_indirect_thunk_rax
               swapper     0 [001] 194167.205660693: ([kernel.kallsyms])                      perf_event_update_userpage
               swapper     0 [001] 194167.205660693: ([kernel.kallsyms])                          calc_timer_values
               swapper     0 [001] 194167.205660693: ([kernel.kallsyms])                              sched_clock_cpu
               swapper     0 [001] 194167.205660693: ([kernel.kallsyms])                          __x86_indirect_thunk_rax
               swapper     0 [001] 194167.205660693: ([kernel.kallsyms])                          arch_perf_update_userpage
               swapper     0 [001] 194167.205660693: ([kernel.kallsyms])                              __fentry__
               swapper     0 [001] 194167.205660693: ([kernel.kallsyms])                              using_native_sched_clock
               swapper     0 [001] 194167.205660693: ([kernel.kallsyms])                              sched_clock_stable
      Signed-off-by: default avatarAndi Kleen <ak@linux.intel.com>
      Tested-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      Tested-by: default avatarLeo Yan <leo.yan@linaro.org>
      Acked-by: default avatarJiri Olsa <jolsa@kernel.org>
      Cc: Kim Phillips <kim.phillips@arm.com>
      Link: http://lkml.kernel.org/r/20180920180540.14039-5-andi@firstfloor.orgSigned-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      99f753f0
    • Andi Kleen's avatar
      tools script: Add --call-trace and --call-ret-trace · d1b1552e
      Andi Kleen authored
      Add short cut options to print PT call trace and call-ret-trace, for
      calls and call and returns. Roughly corresponds to ftrace function
      tracer and function graph tracer.
      
      Just makes these common use cases nicer to use.
      
      % perf record -a -e intel_pt// sleep 1
      % perf script --call-trace
      	    perf   900 [000] 194167.205652203: ([kernel.kallsyms])          perf_pmu_enable
                  perf   900 [000] 194167.205652203: ([kernel.kallsyms])          __x86_indirect_thunk_rax
                  perf   900 [000] 194167.205652203: ([kernel.kallsyms])          event_filter_match
                  perf   900 [000] 194167.205652203: ([kernel.kallsyms])          group_sched_in
                  perf   900 [000] 194167.205652203: ([kernel.kallsyms])              __x86_indirect_thunk_rax
                  perf   900 [000] 194167.205652203: ([kernel.kallsyms])              event_sched_in.isra.107
                  perf   900 [000] 194167.205652203: ([kernel.kallsyms])                  perf_event_set_state.part.71
                  perf   900 [000] 194167.205652203: ([kernel.kallsyms])                      perf_event_update_time
                  perf   900 [000] 194167.205652203: ([kernel.kallsyms])                  perf_pmu_disable
                  perf   900 [000] 194167.205652203: ([kernel.kallsyms])                  perf_log_itrace_start
                  perf   900 [000] 194167.205652203: ([kernel.kallsyms])                  __x86_indirect_thunk_rax
                  perf   900 [000] 194167.205652203: ([kernel.kallsyms])                      perf_event_update_userpage
      
      % perf script --call-ret-trace
      	    perf   900 [000] 194167.205652203:   tr strt     ([unknown])        pt_config
                  perf   900 [000] 194167.205652203:   return      ([kernel.kallsyms])            pt_config
                  perf   900 [000] 194167.205652203:   return      ([kernel.kallsyms])            pt_event_add
                  perf   900 [000] 194167.205652203:   call        ([kernel.kallsyms])            perf_pmu_enable
                  perf   900 [000] 194167.205652203:   return      ([kernel.kallsyms])            perf_pmu_nop_void
                  perf   900 [000] 194167.205652203:   return      ([kernel.kallsyms])            event_sched_in.isra.107
                  perf   900 [000] 194167.205652203:   call        ([kernel.kallsyms])            __x86_indirect_thunk_rax
                  perf   900 [000] 194167.205652203:   return      ([kernel.kallsyms])            perf_pmu_nop_int
                  perf   900 [000] 194167.205652203:   return      ([kernel.kallsyms])            group_sched_in
                  perf   900 [000] 194167.205652203:   call        ([kernel.kallsyms])            event_filter_match
                  perf   900 [000] 194167.205652203:   return      ([kernel.kallsyms])            event_filter_match
                  perf   900 [000] 194167.205652203:   call        ([kernel.kallsyms])            group_sched_in
                  perf   900 [000] 194167.205652203:   call        ([kernel.kallsyms])                __x86_indirect_thunk_rax
                  perf   900 [000] 194167.205652203:   return      ([kernel.kallsyms])                perf_pmu_nop_txn
                  perf   900 [000] 194167.205652203:   call        ([kernel.kallsyms])                event_sched_in.isra.107
                  perf   900 [000] 194167.205652203:   call        ([kernel.kallsyms])                    perf_event_set_state.part.71
      Signed-off-by: default avatarAndi Kleen <ak@linux.intel.com>
      Tested-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      Tested-by: default avatarLeo Yan <leo.yan@linaro.org>
      Acked-by: default avatarJiri Olsa <jolsa@kernel.org>
      Cc: Kim Phillips <kim.phillips@arm.com>
      Link: http://lkml.kernel.org/r/20180920180540.14039-4-andi@firstfloor.orgSigned-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      d1b1552e
    • Andi Kleen's avatar
      perf script: Make itrace script default to all calls · 4eb06815
      Andi Kleen authored
      By default 'perf script' for itrace outputs sampled instructions or
      branches. In my experience this is confusing to users because it's hard
      to correlate with real program behavior. The sampling makes sense for
      tools like 'perf report' that actually sample to reduce the run time,
      but run time is normally not a problem for 'perf script'.  It's better
      to give an accurate representation of the program flow.
      
      Default 'perf script' to output all calls for itrace. That's a much saner
      default. The old behavior can be still requested with 'perf script'
      --itrace=ibxwpe100000
      
      v2: Fix ETM build failure
      v3: Really fix ETM build failure (Kim Phillips)
      Signed-off-by: default avatarAndi Kleen <ak@linux.intel.com>
      Acked-by: default avatarJiri Olsa <jolsa@kernel.org>
      Cc: Kim Phillips <kim.phillips@arm.com>
      Cc: Leo Yan <leo.yan@linaro.org>
      Link: http://lkml.kernel.org/r/20180920180540.14039-3-andi@firstfloor.orgSigned-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      4eb06815
    • Andi Kleen's avatar
      perf script: Add --insn-trace for instruction decoding · b585ebdb
      Andi Kleen authored
      Add a --insn-trace short hand option for decoding and disassembling
      instruction streams for intel_pt. This automatically pipes the output
      into the xed disassembler to generate disassembled instructions.  This
      just makes this use model much nicer to use.
      
      Before
      
        % perf record -e intel_pt// ...
        % perf script --itrace=i0ns --ns -F +insn,-event,-period | xed -F insn: -A -64
         swapper 0 [000] 17276.429606186: ffffffff81010486 pt_config ([kernel.kallsyms])    nopl  %eax, (%rax,%rax,1)
         swapper 0 [000] 17276.429606186: ffffffff8101048b pt_config ([kernel.kallsyms])    add $0x10, %rsp
         swapper 0 [000] 17276.429606186: ffffffff8101048f pt_config ([kernel.kallsyms])    popq  %rbx
         swapper 0 [000] 17276.429606186: ffffffff81010490 pt_config ([kernel.kallsyms])    popq  %rbp
         swapper 0 [000] 17276.429606186: ffffffff81010491 pt_config ([kernel.kallsyms])    popq  %r12
         swapper 0 [000] 17276.429606186: ffffffff81010493 pt_config ([kernel.kallsyms])    popq  %r13
         swapper 0 [000] 17276.429606186: ffffffff81010495 pt_config ([kernel.kallsyms])    popq  %r14
         swapper 0 [000] 17276.429606186: ffffffff81010497 pt_config ([kernel.kallsyms])    popq  %r15
         swapper 0 [000] 17276.429606186: ffffffff81010499 pt_config ([kernel.kallsyms])    retq
         swapper 0 [000] 17276.429606186: ffffffff8101063e pt_event_add ([kernel.kallsyms])         cmpl  $0x1, 0x1b0(%rbx)
         swapper 0 [000] 17276.429606186: ffffffff81010645 pt_event_add ([kernel.kallsyms])         mov $0xffffffea, %eax
         swapper 0 [000] 17276.429606186: ffffffff8101064a pt_event_add ([kernel.kallsyms])         mov $0x0, %edx
         swapper 0 [000] 17276.429606186: ffffffff8101064f pt_event_add ([kernel.kallsyms])         popq  %rbx
         swapper 0 [000] 17276.429606186: ffffffff81010650 pt_event_add ([kernel.kallsyms])         cmovnz %edx, %eax
         swapper 0 [000] 17276.429606186: ffffffff81010653 pt_event_add ([kernel.kallsyms])         jmp 0xffffffff81010635
         swapper 0 [000] 17276.429606186: ffffffff81010635 pt_event_add ([kernel.kallsyms])         retq
         swapper 0 [000] 17276.429606186: ffffffff8115e687 event_sched_in.isra.107 ([kernel.kallsyms])       test %eax, %eax
      
      Now:
      
        % perf record -e intel_pt// ...
        % perf script --insn-trace --xed
        ... same output ...
      
      XED needs to be installed with:
      
        $ git clone https://github.com/intelxed/mbuild.git mbuild
        $ git clone https://github.com/intelxed/xed
        $ cd xed
        $ ./mfile.py
        $ ./mfile.py examples
        $ sudo ./mfile.py --prefix=/usr/local install
        $ sudo cp obj/examples/xed /usr/local/bin
        $ xed | head -3
        ERROR: required argument(s) were missing
        Copyright (C) 2017, Intel Corporation. All rights reserved.
        XED version: [v10.0-328-g7d62c8c49b7b]
        $
      Signed-off-by: default avatarAndi Kleen <ak@linux.intel.com>
      Tested-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      Acked-by: default avatarJiri Olsa <jolsa@kernel.org>
      Link: http://lkml.kernel.org/r/20180920180540.14039-2-andi@firstfloor.org
      [ Fixed up whitespace damage, added the 'mfile.py examples + cp obj/examples/xed ... ]
      Signed-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      b585ebdb
  5. 23 Oct, 2018 1 commit
  6. 22 Oct, 2018 1 commit
    • Arnaldo Carvalho de Melo's avatar
      perf trace: Introduce per-event maximum number of events property · a9c5e6c1
      Arnaldo Carvalho de Melo authored
      Call it 'nr', as in this context it should be expressive enough, i.e.:
      
        # perf trace -e sched:*waking/nr=8,call-graph=fp/
           0.000 :0/0 sched:sched_waking:comm=rcu_sched pid=10 prio=120 target_cpu=001
                                             try_to_wake_up ([kernel.kallsyms])
                                             sched_clock ([kernel.kallsyms])
           3.933 :0/0 sched:sched_waking:comm=rcu_sched pid=10 prio=120 target_cpu=001
                                             try_to_wake_up ([kernel.kallsyms])
                                             sched_clock ([kernel.kallsyms])
           3.970 IPDL Backgroun/3622 sched:sched_waking:comm=Gecko_IOThread pid=3569 prio=120 target_cpu=003
                                             try_to_wake_up ([kernel.kallsyms])
                                             __libc_write (/usr/lib64/libpthread-2.26.so)
          20.069 IPDL Backgroun/3622 sched:sched_waking:comm=Gecko_IOThread pid=3569 prio=120 target_cpu=003
                                             try_to_wake_up ([kernel.kallsyms])
                                             __libc_write (/usr/lib64/libpthread-2.26.so)
          37.170 IPDL Backgroun/3622 sched:sched_waking:comm=Gecko_IOThread pid=3569 prio=120 target_cpu=003
                                             try_to_wake_up ([kernel.kallsyms])
                                             __libc_write (/usr/lib64/libpthread-2.26.so)
          53.267 IPDL Backgroun/3622 sched:sched_waking:comm=Gecko_IOThread pid=3569 prio=120 target_cpu=003
                                             try_to_wake_up ([kernel.kallsyms])
                                             __libc_write (/usr/lib64/libpthread-2.26.so)
          70.365 IPDL Backgroun/3622 sched:sched_waking:comm=Gecko_IOThread pid=3569 prio=120 target_cpu=003
                                             try_to_wake_up ([kernel.kallsyms])
                                             __libc_write (/usr/lib64/libpthread-2.26.so)
          75.781 Web Content/3649 sched:sched_waking:comm=JS Helper pid=3670 prio=120 target_cpu=000
                                             try_to_wake_up ([kernel.kallsyms])
                                             try_to_wake_up ([kernel.kallsyms])
                                             wake_up_q ([kernel.kallsyms])
                                             futex_wake ([kernel.kallsyms])
                                             do_futex ([kernel.kallsyms])
                                             __x64_sys_futex ([kernel.kallsyms])
                                             do_syscall_64 ([kernel.kallsyms])
                                             entry_SYSCALL_64_after_hwframe ([kernel.kallsyms])
                                             pthread_cond_signal@@GLIBC_2.3.2 (/usr/lib64/libpthread-2.26.so)
        #
      
        # perf trace -e sched:*switch/nr=2/,block:*_plug/nr=4/,block:*_unplug/nr=1/,net:*dev_queue/nr=3,max-stack=16/
           0.000 :0/0 sched:sched_switch:swapper/0:0 [120] S ==> trace:3367 [120]
           0.046 :0/0 sched:sched_switch:swapper/1:0 [120] S ==> kworker/u16:58:2722 [120]
         570.670 irq/50-iwlwifi/680 net:net_dev_queue:dev=wlp3s0 skbaddr=0xffff93498051ef00 len=66
                                             __dev_queue_xmit ([kernel.kallsyms])
        1106.141 jbd2/dm-0-8/476 block:block_plug:[jbd2/dm-0-8]
        1106.175 jbd2/dm-0-8/476 block:block_unplug:[jbd2/dm-0-8] 1
        1618.088 kworker/u16:30/2694 block:block_plug:[kworker/u16:30]
        1810.000 :0/0 net:net_dev_queue:dev=vnet0 skbaddr=0xffff93498051ef00 len=52
                                             __dev_queue_xmit ([kernel.kallsyms])
        3857.974 :0/0 net:net_dev_queue:dev=vnet0 skbaddr=0xffff93498051f900 len=52
                                             __dev_queue_xmit ([kernel.kallsyms])
        4790.277 jbd2/dm-2-8/748 block:block_plug:[jbd2/dm-2-8]
        4790.448 jbd2/dm-2-8/748 block:block_plug:[jbd2/dm-2-8]
        #
      
      The global --max-events has precendence:
      
        # trace --max-events 3 -e sched:*switch/nr=2/,block:*_plug/nr=4/,block:*_unplug/nr=1/,net:*dev_queue/nr=3,max-stack=16/
           0.000 :0/0 sched:sched_switch:swapper/0:0 [120] S ==> qemu-system-x86:2252 [120]
           0.029 qemu-system-x8/2252 sched:sched_switch:qemu-system-x86:2252 [120] D ==> swapper/0:0 [120]
          58.047 DNS Res~er #14/31661 net:net_dev_queue:dev=wlp3s0 skbaddr=0xffff9346966af100 len=84
                                             __dev_queue_xmit ([kernel.kallsyms])
                                             __libc_send (/usr/lib64/libpthread-2.26.so)
        #
      
      Cc: Adrian Hunter <adrian.hunter@intel.com>
      Cc: David Ahern <dsahern@gmail.com>
      Cc: Jiri Olsa <jolsa@kernel.org>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Wang Nan <wangnan0@huawei.com>
      Link: https://lkml.kernel.org/n/tip-s4jswltvh660ughvg9nwngah@git.kernel.orgSigned-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      a9c5e6c1
  7. 19 Oct, 2018 1 commit
    • Arnaldo Carvalho de Melo's avatar
      perf trace: Introduce --max-events · 5067a8cd
      Arnaldo Carvalho de Melo authored
      Allow stopping tracing after a number of events take place, considering
      strace-like syscalls formatting as one event per enter/exit pair or when
      in a multi-process tracing session a syscall is interrupted and printed
      ending with '...'.
      
      Examples included in the documentation:
      
      Trace the first 4 open, openat or open_by_handle_at syscalls (in the future more syscalls may match here):
      
        $ perf trace -e open* --max-events 4
        [root@jouet perf]# trace -e open* --max-events 4
        2272.992 ( 0.037 ms): gnome-shell/1370 openat(dfd: CWD, filename: /proc/self/stat) = 31
        2277.481 ( 0.139 ms): gnome-shell/3039 openat(dfd: CWD, filename: /proc/self/stat) = 65
        3026.398 ( 0.076 ms): gnome-shell/3039 openat(dfd: CWD, filename: /proc/self/stat) = 65
        4294.665 ( 0.015 ms): sed/15879 openat(dfd: CWD, filename: /etc/ld.so.cache, flags: CLOEXEC) = 3
        $
      
      Trace the first minor page fault when running a workload:
      
        # perf trace -F min --max-stack=7 --max-events 1 sleep 1
           0.000 ( 0.000 ms): sleep/18006 minfault [__clear_user+0x1a] => 0x5626efa56080 (?k)
                                             __clear_user ([kernel.kallsyms])
                                             load_elf_binary ([kernel.kallsyms])
                                             search_binary_handler ([kernel.kallsyms])
                                             __do_execve_file.isra.33 ([kernel.kallsyms])
                                             __x64_sys_execve ([kernel.kallsyms])
                                             do_syscall_64 ([kernel.kallsyms])
                                             entry_SYSCALL_64 ([kernel.kallsyms])
        #
      
      Trace the next min page page fault to take place on the first CPU:
      
        # perf trace -F min --call-graph=dwarf --max-events 1 --cpu 0
           0.000 ( 0.000 ms): Web Content/17136 minfault [js::gc::Chunk::fetchNextDecommittedArena+0x4b] => 0x7fbe6181b000 (?.)
                                             js::gc::FreeSpan::initAsEmpty (inlined)
                                             js::gc::Arena::setAsNotAllocated (inlined)
                                             js::gc::Chunk::fetchNextDecommittedArena (/usr/lib64/firefox/libxul.so)
                                             js::gc::Chunk::allocateArena (/usr/lib64/firefox/libxul.so)
                                             js::gc::GCRuntime::allocateArena (/usr/lib64/firefox/libxul.so)
                                             js::gc::ArenaLists::allocateFromArena (/usr/lib64/firefox/libxul.so)
                                             js::gc::GCRuntime::tryNewTenuredThing<JSString, (js::AllowGC)1> (inlined)
                                             js::AllocateString<JSString, (js::AllowGC)1> (/usr/lib64/firefox/libxul.so)
                                             js::Allocate<JSThinInlineString, (js::AllowGC)1> (inlined)
                                             JSThinInlineString::new_<(js::AllowGC)1> (inlined)
                                             AllocateInlineString<(js::AllowGC)1, unsigned char> (inlined)
                                             js::ConcatStrings<(js::AllowGC)1> (/usr/lib64/firefox/libxul.so)
                                             [0x18b26e6bc2bd] (/tmp/perf-17136.map)
      
      Tracing the next four ext4 operations on a specific CPU:
      
        # perf trace -e ext4:*/call-graph=fp/ --max-events 4 --cpu 3
           0.000 mutt/3849 ext4:ext4_es_lookup_extent_enter:dev 253,2 ino 57277 lblk 0
                                             ext4_es_lookup_extent ([kernel.kallsyms])
                                             read (/usr/lib64/libc-2.26.so)
           0.097 mutt/3849 ext4:ext4_es_lookup_extent_exit:dev 253,2 ino 57277 found 0 [0/0) 0
                                             ext4_es_lookup_extent ([kernel.kallsyms])
                                             read (/usr/lib64/libc-2.26.so)
           0.141 mutt/3849 ext4:ext4_ext_map_blocks_enter:dev 253,2 ino 57277 lblk 0 len 1 flags
                                             ext4_ext_map_blocks ([kernel.kallsyms])
                                             read (/usr/lib64/libc-2.26.so)
           0.184 mutt/3849 ext4:ext4_ext_load_extent:dev 253,2 ino 57277 lblk 1516511 pblk 18446744071750013657
                                             __read_extent_tree_block ([kernel.kallsyms])
                                             __read_extent_tree_block ([kernel.kallsyms])
                                             ext4_find_extent ([kernel.kallsyms])
                                             ext4_ext_map_blocks ([kernel.kallsyms])
                                             ext4_map_blocks ([kernel.kallsyms])
                                             ext4_mpage_readpages ([kernel.kallsyms])
                                             read_pages ([kernel.kallsyms])
                                             __do_page_cache_readahead ([kernel.kallsyms])
                                             ondemand_readahead ([kernel.kallsyms])
                                             generic_file_read_iter ([kernel.kallsyms])
                                             __vfs_read ([kernel.kallsyms])
                                             vfs_read ([kernel.kallsyms])
                                             ksys_read ([kernel.kallsyms])
                                             do_syscall_64 ([kernel.kallsyms])
                                             entry_SYSCALL_64 ([kernel.kallsyms])
                                             read (/usr/lib64/libc-2.26.so)
        #
      
      Cc: Adrian Hunter <adrian.hunter@intel.com>
      Cc: David Ahern <dsahern@gmail.com>
      Cc: Jiri Olsa <jolsa@kernel.org>
      Cc: Milian Wolff <milian.wolff@kdab.com>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Rudá Moura <ruda.moura@gmail.com>
      Cc: Wang Nan <wangnan0@huawei.com>
      Link: https://lkml.kernel.org/n/tip-sweh107bs7ol5bzls0m4tqdz@git.kernel.orgSigned-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      5067a8cd
  8. 18 Sep, 2018 1 commit
  9. 08 Aug, 2018 2 commits
    • Jiri Olsa's avatar
      perf report: Add --percent-type option · e6902d1b
      Jiri Olsa authored
      Set annotation percent type from following choices:
      
        global-period, local-period, global-hits, local-hits
      
      With following report option setup the percent type will be passed to
      annotation browser:
      
        $ perf report --percent-type period-local
      
      The local/global keywords set if the percentage is computed in the scope
      of the function (local) or the whole data (global).  The period/hits
      keywords set the base the percentage is computed on - the samples period
      or the number of samples (hits).
      Signed-off-by: default avatarJiri Olsa <jolsa@kernel.org>
      Tested-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: David Ahern <dsahern@gmail.com>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Stephane Eranian <eranian@google.com>
      Link: http://lkml.kernel.org/r/20180804130521.11408-21-jolsa@kernel.orgSigned-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      e6902d1b
    • Jiri Olsa's avatar
      perf annotate: Add --percent-type option · 88c21190
      Jiri Olsa authored
      Add --percent-type option to set annotation percent type from following
      choices:
      
        global-period, local-period, global-hits, local-hits
      
      Examples:
      
        $ perf annotate --percent-type period-local --stdio | head -1
         Percent         |      Source code ... es, percent: local period)
        $ perf annotate --percent-type hits-local --stdio | head -1
         Percent         |      Source code ... es, percent: local hits)
        $ perf annotate --percent-type hits-global --stdio | head -1
         Percent         |      Source code ... es, percent: global hits)
        $ perf annotate --percent-type period-global --stdio | head -1
         Percent         |      Source code ... es, percent: global period)
      
      The local/global keywords set if the percentage is computed in the scope
      of the function (local) or the whole data (global).
      
      The period/hits keywords set the base the percentage is computed on -
      the samples period or the number of samples (hits).
      Signed-off-by: default avatarJiri Olsa <jolsa@kernel.org>
      Tested-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: David Ahern <dsahern@gmail.com>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Stephane Eranian <eranian@google.com>
      Link: http://lkml.kernel.org/r/20180804130521.11408-20-jolsa@kernel.orgSigned-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      88c21190
  10. 24 Jul, 2018 1 commit
  11. 15 Jul, 2018 1 commit
  12. 07 Jun, 2018 1 commit
  13. 06 Jun, 2018 3 commits
    • Jin Yao's avatar
      perf script python: Add dict fields introduction to Documentation · ac56aa45
      Jin Yao authored
      Add a brief introduction about fields to perf-script-python.txt.
      
      It should help python script developers in easily finding what fields
      are supported.
      Signed-off-by: default avatarJin Yao <yao.jin@linux.intel.com>
      Reviewed-by: default avatarAndi Kleen <ak@linux.intel.com>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Jin Yao <yao.jin@intel.com>
      Cc: Jiri Olsa <jolsa@kernel.org>
      Cc: Kan Liang <kan.liang@linux.intel.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Link: http://lkml.kernel.org/r/1527843663-32288-4-git-send-email-yao.jin@linux.intel.comSigned-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      ac56aa45
    • Jiri Olsa's avatar
      perf stat: Display user and system time · 0ce2da14
      Jiri Olsa authored
      Adding the support to read rusage data once the workload is finished and
      display the system/user time values:
      
        $ perf stat --null perf bench sched pipe
        ...
      
         Performance counter stats for 'perf bench sched pipe':
      
             5.342599256 seconds time elapsed
      
             2.544434000 seconds user
             4.549691000 seconds sys
      
      It works only in non -r mode and only for workload target.
      
      So as of now, for workload targets, we display 3 types of timings. The
      time we meassure in perf stat from enable to disable+period:
      
             5.342599256 seconds time elapsed
      
      The time spent in user and system lands, displayed only for workload
      session/target:
      
             2.544434000 seconds user
             4.549691000 seconds sys
      
      Those times are the very same displayed by 'time' tool.  They are
      returned by wait4 call via the getrusage struct interface.
      
      Committer notes:
      
      Had to rename some variables to avoid this on older systems such as
      centos:6:
      
        builtin-stat.c: In function 'print_footer':
        builtin-stat.c:1831: warning: declaration of 'stime' shadows a global declaration
        /usr/include/time.h:297: warning: shadowed declaration is here
      
      Committer testing:
      
        # perf stat --null time perf bench sched pipe
        # Running 'sched/pipe' benchmark:
        # Executed 1000000 pipe operations between two processes
      
             Total time: 5.526 [sec]
      
               5.526534 usecs/op
                 180945 ops/sec
        1.00user 6.25system 0:05.52elapsed 131%CPU (0avgtext+0avgdata 8056maxresident)k
        0inputs+0outputs (0major+606minor)pagefaults 0swaps
      
         Performance counter stats for 'time perf bench sched pipe':
      
               5.530978744 seconds time elapsed
      
               1.004037000 seconds user
               6.259937000 seconds sys
      
        #
      Suggested-by: default avatarIngo Molnar <mingo@kernel.org>
      Signed-off-by: default avatarJiri Olsa <jolsa@kernel.org>
      Tested-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: David Ahern <dsahern@gmail.com>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Link: http://lkml.kernel.org/r/20180605121313.31337-1-jolsa@kernel.orgSigned-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      0ce2da14
    • Alexey Budankov's avatar
      perf record: Enable arbitrary event names thru name= modifier · f92da712
      Alexey Budankov authored
      Enable complex event names containing [.:=,] symbols to be encoded into Perf
      trace using name= modifier e.g. like this:
      
        perf record -e cpu/name=\'OFFCORE_RESPONSE:request=DEMAND_RFO:response=L3_HIT.SNOOP_HITM\',\
      		  period=0x3567e0,event=0x3c,cmask=0x1/Duk ./futex
      
      Below is how it looks like in the report output. Please note explicit escaped
      quoting at cmdline string in the header so that thestring can be directly reused
      for another collection in shell:
      
      perf report --header
      
        # ========
        ...
        # cmdline : /root/abudanko/kernel/tip/tools/perf/perf record -v -e cpu/name=\'OFFCORE_RESPONSE:request=DEMAND_RFO:response=L3_HIT.SNOOP_HITM\',period=0x3567e0,event=0x3c,cmask=0x1/Duk ./futex
        # event : name = OFFCORE_RESPONSE:request=DEMAND_RFO:response=L3_HIT.SNOOP_HITM, , type = 4, size = 112, config = 0x100003c, { sample_period, sample_freq } = 3500000, sample_type = IP|TID|TIME, disabled = 1, inh
        ...
        # ========
        #
        #
        # Total Lost Samples: 0
        #
        # Samples: 24K of event 'OFFCORE_RESPONSE:request=DEMAND_RFO:response=L3_HIT.SNOOP_HITM'
        # Event count (approx.): 86492000000
        #
        # Overhead  Command  Shared Object     Symbol
        # ........  .......  ................  ..............................................
        #
            14.75%  futex    [kernel.vmlinux]  [k] __entry_trampoline_start
      ...
      
        perf stat -e cpu/name=\'CPU_CLK_UNHALTED.THREAD:cmask=0x1\',period=0x3567e0,event=0x3c,cmask=0x1/Duk ./futex
      
        10000000 process context switches in 16678890291ns (1667.9ns/ctxsw)
      
         Performance counter stats for './futex':
      
            88,095,770,571      CPU_CLK_UNHALTED.THREAD:cmask=0x1
      
              16.679542407 seconds time elapsed
      Signed-off-by: default avatarAlexey Budankov <alexey.budankov@linux.intel.com>
      Acked-by: default avatarAndi Kleen <ak@linux.intel.com>
      Acked-by: default avatarJiri Olsa <jolsa@kernel.org>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Link: http://lkml.kernel.org/r/c194b060-761d-0d50-3b21-bb4ed680002d@linux.intel.comSigned-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      f92da712
  14. 30 May, 2018 2 commits
  15. 26 Apr, 2018 5 commits
  16. 23 Apr, 2018 1 commit
  17. 18 Apr, 2018 1 commit
  18. 17 Apr, 2018 1 commit
    • Alexey Budankov's avatar
      perf script: Extend misc field decoding with switch out event type · bf30cc18
      Alexey Budankov authored
      Append 'p' sign to 'S' tag designating the type of context switch out event so
      'Sp' means preemption context switch. Documentation is extended to cover
      new presentation changes.
      
        $ perf script --show-switch-events -F +misc -I -i perf.data:
      
                hdparm 4073 [004] U  762.198265:     380194 cycles:ppp:      7faf727f5a23 strchr (/usr/lib64/ld-2.26.so)
                hdparm 4073 [004] K  762.198366:     441572 cycles:ppp:  ffffffffb9218435 alloc_set_pte (/lib/modules/4.16.0-rc6+/build/vmlinux)
                hdparm 4073 [004] S  762.198391: PERF_RECORD_SWITCH_CPU_WIDE OUT          next pid/tid:    0/0
               swapper    0 [004]    762.198392: PERF_RECORD_SWITCH_CPU_WIDE IN           prev pid/tid: 4073/4073
               swapper    0 [004] Sp 762.198477: PERF_RECORD_SWITCH_CPU_WIDE OUT preempt  next pid/tid: 4073/4073
                hdparm 4073 [004]    762.198478: PERF_RECORD_SWITCH_CPU_WIDE IN           prev pid/tid:    0/0
               swapper    0 [007] K  762.198514:    2303073 cycles:ppp:  ffffffffb98b0c66 intel_idle (/lib/modules/4.16.0-rc6+/build/vmlinux)
               swapper    0 [007] Sp 762.198561: PERF_RECORD_SWITCH_CPU_WIDE OUT preempt  next pid/tid: 1134/1134
        kworker/u16:18 1134 [007]    762.198562: PERF_RECORD_SWITCH_CPU_WIDE IN           prev pid/tid:    0/0
        kworker/u16:18 1134 [007] S  762.198567: PERF_RECORD_SWITCH_CPU_WIDE OUT          next pid/tid:    0/0
      Signed-off-by: default avatarAlexey Budankov <alexey.budankov@linux.intel.com>
      Tested-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Andi Kleen <ak@linux.intel.com>
      Cc: Jiri Olsa <jolsa@redhat.com>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Link: http://lkml.kernel.org/r/5fc65ce7-8ca5-53ae-8858-8ddd27290575@linux.intel.comSigned-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      bf30cc18
  19. 13 Apr, 2018 1 commit
    • Arnaldo Carvalho de Melo's avatar
      perf annotate: Allow setting the offset level in .perfconfig · 43c40231
      Arnaldo Carvalho de Melo authored
      The default is 1 (jump_target):
      
        # perf annotate --ignore-vmlinux --stdio2 _raw_spin_lock_irqsave
        Samples: 3K of event 'cycles:ppp', 3000 Hz, Event count (approx.): 2766398574
        _raw_spin_lock_irqsave() /proc/kcore
          0.26        nop
          4.61        push   %rbx
         19.33        pushfq
          7.97        pop    %rax
          0.32        nop
          0.06        mov    %rax,%rbx
         14.63        cli
          0.06        nop
                      xor    %eax,%eax
                      mov    $0x1,%edx
         49.94        lock   cmpxchg %edx,(%rdi)
          0.16        test   %eax,%eax
                    ↓ jne    2b
          2.66        mov    %rbx,%rax
                      pop    %rbx
                    ← retq
                2b:   mov    %eax,%esi
                    → callq  *ffffffffb30eaed0
                      mov    %rbx,%rax
                      pop    %rbx
                    ← retq
        #
      
      But one can ask for showing offsets for call instructions by setting
      this:
      
        # perf annotate --ignore-vmlinux --stdio2 _raw_spin_lock_irqsave
        Samples: 3K of event 'cycles:ppp', 3000 Hz, Event count (approx.): 2766398574
        _raw_spin_lock_irqsave() /proc/kcore
          0.26        nop
          4.61        push   %rbx
         19.33        pushfq
          7.97        pop    %rax
          0.32        nop
          0.06        mov    %rax,%rbx
         14.63        cli
          0.06        nop
                      xor    %eax,%eax
                      mov    $0x1,%edx
         49.94        lock   cmpxchg %edx,(%rdi)
          0.16        test   %eax,%eax
                    ↓ jne    2b
          2.66        mov    %rbx,%rax
                      pop    %rbx
                    ← retq
                2b:   mov    %eax,%esi
                2d: → callq  *ffffffffb30eaed0
                      mov    %rbx,%rax
                      pop    %rbx
                    ← retq
        #
      
      Or using a big value to ask for all offsets to be shown:
      
        # cat ~/.perfconfig
        [annotate]
      
      	offset_level = 100
      
      	hide_src_code = true
        # perf annotate --ignore-vmlinux --stdio2 _raw_spin_lock_irqsave
        Samples: 3K of event 'cycles:ppp', 3000 Hz, Event count (approx.): 2766398574
        _raw_spin_lock_irqsave() /proc/kcore
          0.26   0:   nop
          4.61   5:   push   %rbx
         19.33   6:   pushfq
          7.97   7:   pop    %rax
          0.32   8:   nop
          0.06   d:   mov    %rax,%rbx
         14.63  10:   cli
          0.06  11:   nop
                17:   xor    %eax,%eax
                19:   mov    $0x1,%edx
         49.94  1e:   lock   cmpxchg %edx,(%rdi)
          0.16  22:   test   %eax,%eax
                24: ↓ jne    2b
          2.66  26:   mov    %rbx,%rax
                29:   pop    %rbx
                2a: ← retq
                2b:   mov    %eax,%esi
                2d: → callq  *ffffffffb30eaed0
                32:   mov    %rbx,%rax
                35:   pop    %rbx
                36: ← retq
         #
      
      This also affects the TUI, i.e. the default 'perf annotate' and 'perf
      top/report' -> A hotkey -> annotate interfaces, when slang-devel is present
      in the build, i.e.:
      
        # perf version --build-options | grep slang
                    libslang: [ on  ]  # HAVE_SLANG_SUPPORT
        #
      
      Cc: Adrian Hunter <adrian.hunter@intel.com>
      Cc: Andi Kleen <ak@linux.intel.com>
      Cc: David Ahern <dsahern@gmail.com>
      Cc: Jin Yao <yao.jin@linux.intel.com>
      Cc: Jiri Olsa <jolsa@kernel.org>
      Cc: Martin Liška <mliska@suse.cz>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Ravi Bangoria <ravi.bangoria@linux.vnet.ibm.com>
      Cc: Thomas Richter <tmricht@linux.vnet.ibm.com>
      Cc: Wang Nan <wangnan0@huawei.com>
      Link: https://lkml.kernel.org/n/tip-venm6x5zrt40eu8hxdsmqxz6@git.kernel.orgSigned-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      43c40231
  20. 12 Apr, 2018 2 commits
  21. 02 Apr, 2018 3 commits
    • Jin Yao's avatar
      perf version: Add man page · 70984672
      Jin Yao authored
      Since a new option '--build-options' is created for 'perf version', so
      we need to document it.
      Signed-off-by: default avatarJin Yao <yao.jin@linux.intel.com>
      Acked-by: default avatarJiri Olsa <jolsa@kernel.org>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Andi Kleen <ak@linux.intel.com>
      Cc: Jin Yao <yao.jin@intel.com>
      Cc: Kan Liang <kan.liang@intel.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Link: http://lkml.kernel.org/r/1522402036-22915-7-git-send-email-yao.jin@linux.intel.comSigned-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      70984672
    • Arnaldo Carvalho de Melo's avatar
      perf trace: Show only failing syscalls · 0a6545bd
      Arnaldo Carvalho de Melo authored
      For instance:
      
        # perf probe "vfs_getname=getname_flags:72 pathname=result->name:string"
        Added new event:
          probe:vfs_getname    (on getname_flags:72 with pathname=result->name:string)
      
        You can now use it in all perf tools, such as:
      
      	  perf record -e probe:vfs_getname -aR sleep 1
      
        # perf trace --failure sleep 1
           0.043 ( 0.010 ms): sleep/10978 access(filename: /etc/ld.so.preload, mode: R) = -1 ENOENT No such file or directory
      
      For reference, here are all the syscalls in this case:
      
        # perf trace sleep 1
               ? (         ): sleep/10976  ... [continued]: execve()) = 0
             0.027 ( 0.001 ms): sleep/10976 brk() = 0x55bdc2d04000
             0.044 ( 0.010 ms): sleep/10976 access(filename: /etc/ld.so.preload, mode: R) = -1 ENOENT No such file or directory
             0.057 ( 0.006 ms): sleep/10976 openat(dfd: CWD, filename: /etc/ld.so.cache, flags: CLOEXEC) = 3
             0.064 ( 0.002 ms): sleep/10976 fstat(fd: 3, statbuf: 0x7fffac22b370) = 0
             0.067 ( 0.003 ms): sleep/10976 mmap(len: 111457, prot: READ, flags: PRIVATE, fd: 3) = 0x7feec8615000
             0.071 ( 0.001 ms): sleep/10976 close(fd: 3) = 0
             0.080 ( 0.007 ms): sleep/10976 openat(dfd: CWD, filename: /lib64/libc.so.6, flags: CLOEXEC) = 3
             0.088 ( 0.002 ms): sleep/10976 read(fd: 3, buf: 0x7fffac22b538, count: 832) = 832
             0.092 ( 0.001 ms): sleep/10976 fstat(fd: 3, statbuf: 0x7fffac22b3d0) = 0
             0.094 ( 0.002 ms): sleep/10976 mmap(len: 8192, prot: READ|WRITE, flags: PRIVATE|ANONYMOUS) = 0x7feec8613000
             0.099 ( 0.004 ms): sleep/10976 mmap(len: 3889792, prot: EXEC|READ, flags: PRIVATE|DENYWRITE, fd: 3) = 0x7feec8057000
             0.104 ( 0.007 ms): sleep/10976 mprotect(start: 0x7feec8203000, len: 2097152) = 0
             0.112 ( 0.005 ms): sleep/10976 mmap(addr: 0x7feec8403000, len: 24576, prot: READ|WRITE, flags: PRIVATE|DENYWRITE|FIXED, fd: 3, off: 1753088) = 0x7feec8403000
             0.120 ( 0.003 ms): sleep/10976 mmap(addr: 0x7feec8409000, len: 14976, prot: READ|WRITE, flags: PRIVATE|ANONYMOUS|FIXED) = 0x7feec8409000
             0.128 ( 0.001 ms): sleep/10976 close(fd: 3) = 0
             0.139 ( 0.001 ms): sleep/10976 arch_prctl(option: 4098, arg2: 140663540761856) = 0
             0.186 ( 0.004 ms): sleep/10976 mprotect(start: 0x7feec8403000, len: 16384, prot: READ) = 0
             0.204 ( 0.003 ms): sleep/10976 mprotect(start: 0x55bdc0ec3000, len: 4096, prot: READ) = 0
             0.209 ( 0.004 ms): sleep/10976 mprotect(start: 0x7feec8631000, len: 4096, prot: READ) = 0
             0.214 ( 0.010 ms): sleep/10976 munmap(addr: 0x7feec8615000, len: 111457) = 0
             0.269 ( 0.001 ms): sleep/10976 brk() = 0x55bdc2d04000
             0.271 ( 0.002 ms): sleep/10976 brk(brk: 0x55bdc2d25000) = 0x55bdc2d25000
             0.274 ( 0.001 ms): sleep/10976 brk() = 0x55bdc2d25000
             0.278 ( 0.007 ms): sleep/10976 open(filename: /usr/lib/locale/locale-archive, flags: CLOEXEC) = 3
             0.288 ( 0.001 ms): sleep/10976 fstat(fd: 3</usr/lib/locale/locale-archive>, statbuf: 0x7feec8408aa0) = 0
             0.290 ( 0.003 ms): sleep/10976 mmap(len: 113045344, prot: READ, flags: PRIVATE, fd: 3) = 0x7feec1488000
             0.297 ( 0.001 ms): sleep/10976 close(fd: 3</usr/lib/locale/locale-archive>) = 0
             0.325 (1000.193 ms): sleep/10976 nanosleep(rqtp: 0x7fffac22c0b0) = 0
          1000.560 ( 0.006 ms): sleep/10976 close(fd: 1) = 0
          1000.573 ( 0.005 ms): sleep/10976 close(fd: 2) = 0
          1000.596 (         ): sleep/10976 exit_group()
        #
      
      And can be done systemwide, etc, with backtraces:
      
        # perf trace --max-stack=16 --failure sleep 1
           0.048 ( 0.015 ms): sleep/11092 access(filename: /etc/ld.so.preload, mode: R) = -1 ENOENT No such file or directory
                                             __access (inlined)
                                             dl_main (/usr/lib64/ld-2.26.so)
        #
      
      Or for some specific syscalls:
      
        # perf trace --max-stack=16 -e openat --failure cat /tmp/rien
        cat: /tmp/rien: No such file or directory
             0.251 ( 0.012 ms): cat/11106 openat(dfd: CWD, filename: /tmp/rien) = -1 ENOENT No such file or directory
                                               __libc_open64 (inlined)
                                               main (/usr/bin/cat)
                                               __libc_start_main (/usr/lib64/libc-2.26.so)
                                               _start (/usr/bin/cat)
        #
      
      Look for inotify* syscalls that fail, system wide, for 2 seconds, with backtraces:
      
        # perf trace -a --max-stack=16 --failure -e inotify* sleep 2
         819.165 ( 0.058 ms): gmain/1724 inotify_add_watch(fd: 8<anon_inode:inotify>, pathname: /home/acme/~, mask: 16789454) = -1 ENOENT No such file or directory
                                             __GI_inotify_add_watch (inlined)
                                             _ik_watch (/usr/lib64/libgio-2.0.so.0.5400.3)
                                             _ip_start_watching (/usr/lib64/libgio-2.0.so.0.5400.3)
                                             im_scan_missing (/usr/lib64/libgio-2.0.so.0.5400.3)
                                             g_timeout_dispatch (/usr/lib64/libglib-2.0.so.0.5400.3)
                                             g_main_context_dispatch (/usr/lib64/libglib-2.0.so.0.5400.3)
                                             g_main_context_iterate.isra.23 (/usr/lib64/libglib-2.0.so.0.5400.3)
                                             g_main_context_iteration (/usr/lib64/libglib-2.0.so.0.5400.3)
                                             glib_worker_main (/usr/lib64/libglib-2.0.so.0.5400.3)
                                             g_thread_proxy (/usr/lib64/libglib-2.0.so.0.5400.3)
                                             start_thread (/usr/lib64/libpthread-2.26.so)
                                             __GI___clone (inlined)
        #
      
      Cc: Adrian Hunter <adrian.hunter@intel.com>
      Cc: David Ahern <dsahern@gmail.com>
      Cc: Jiri Olsa <jolsa@kernel.org>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Wang Nan <wangnan0@huawei.com>
      Link: https://lkml.kernel.org/n/tip-8f7d3mngaxvi7tlzloz3n7cs@git.kernel.orgSigned-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      0a6545bd
    • Kim Phillips's avatar
      perf tools: Add a "dso_size" sort order · b74d12d5
      Kim Phillips authored
      Add DSO size to perf report/top sort output list.
      
      This includes adding a map__size fn to map.h, which is
      approximately equal to the DSO data file_size:
      
        DSO				file size	map (end-start)	file / (end-start)
        libwebkit2gtk-4.0.so.37.24.9	43260072	41295872	95%
        libglib-2.0.so.0.5400.1		 1125680	 1118208	99%
        libc-2.26.so			 1960656 	 1925120	101%
        libdbus-1.so.3.14.13		  309456 	  303104	102%
      
      Sample output:
      
        $ ./perf report -s dso_size,dso
        Samples: 2K of event 'cycles:uppp', Event count (approx.): 128373340
        Overhead  DSO size  Shared Object
          90.62%   unknown  [unknown]
           2.87%   1118208  libglib-2.0.so.0.5400.1
           1.92%    303104  libdbus-1.so.3.14.13
           1.42%   1925120  libc-2.26.so
           0.77%  41295872  libwebkit2gtk-4.0.so.37.24.9
           0.61%    335872  libgobject-2.0.so.0.5400.1
           0.41%   1052672  libgdk-3.so.0.2200.25
           0.36%    106496  libpthread-2.26.so
           0.29%    221184  dbus-daemon
           0.17%    159744  ld-2.26.so
           0.13%     49152  libwayland-client.so.0.3.0
           0.12%   1642496  libgio-2.0.so.0.5400.1
           0.09%   73277443  libgtk-3.so.0.2200.25
           0.09%  12324864  libmozjs-52.so.0.0.0
           0.05%   4796416  perf
           0.04%    843776  libgjs.so.0.0.0
           0.03%   1409024  libmutter-clutter-1.so
      
      Committer testing:
      
      To sort by DSO size, use:
      
        # perf report -F dso_size,dso,overhead -s dso_size
        <SNIP>
           3465216  libdns-export.so.174.0.1   0.00%
           3522560  libgc.so.1.0.3             0.00%
           3538944  libbfd-2.29-13.fc27.so     0.59%
           3670016  libunistring.so.2.1.0      0.00%
           3723264  libguile-2.0.so.22.8.1     0.00%
           3776512  libgio-2.0.so.0.5400.3     0.00%
           3891200  libc-2.26.so               0.96%
           3944448  libmozjs-17.0.so           0.00%
           4218880  libperl.so.5.26.1          0.18%
           4452352  libpython2.7.so.1.0        0.02%
           4472832  perf                       0.02%
           4603904  git                        0.01%
           4751360  libcrypto.so.1.1.0g        0.00%
           5005312  libslang.so.2.3.1          0.00%
           7315456  libgtk-3.so.0.2200.26      0.09%
           8818688  i965_dri.so                2.46%
           8818688  i965_dri.so (deleted)      1.26%
          12414976  libmozjs-52.so.0.0.0       0.03%
          23642112  cc1                        2.02%
          27889664  [kernel.kallsyms]         25.41%
          80834560  libxul.so (deleted)       15.68%
          98078720  chrome                    32.03%
        1056964608  [kernel.kallsyms]          1.59%
        #
      Signed-off-by: default avatarKim Phillips <kim.phillips@arm.com>
      Tested-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Andi Kleen <ak@linux.intel.com>
      Cc: Jin Yao <yao.jin@linux.intel.com>
      Cc: Jiri Olsa <jolsa@redhat.com>
      Cc: Maxim Kuvyrkov <maxim.kuvyrkov@linaro.org>
      Cc: Milian Wolff <milian.wolff@kdab.com>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Link: http://lkml.kernel.org/r/20180327060956.1c01ebe67a2a941bb4468c6f@arm.comSigned-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      b74d12d5
  22. 21 Mar, 2018 3 commits
    • Arnaldo Carvalho de Melo's avatar
      perf report: Introduce --ignore-vmlinux command line option · 91340c51
      Arnaldo Carvalho de Melo authored
      We've had this in 'perf top' for quite a while, useful if one wishes
      to force using /proc/kcore to do annotation using the patched kernel
      instead of the ELF image it started from, aka vmlinux.
      
      Cc: Adrian Hunter <adrian.hunter@intel.com>
      Cc: Andi Kleen <ak@linux.intel.com>
      Cc: David Ahern <dsahern@gmail.com>
      Cc: Jin Yao <yao.jin@linux.intel.com>
      Cc: Jiri Olsa <jolsa@kernel.org>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Wang Nan <wangnan0@huawei.com>
      Link: https://lkml.kernel.org/n/tip-ircpvox4wzsv7gasrpb28fw9@git.kernel.orgSigned-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      91340c51
    • Arnaldo Carvalho de Melo's avatar
      perf annotate: Introduce --ignore-vmlinux command line option · be316409
      Arnaldo Carvalho de Melo authored
      This is already present in 'perf top', albeit undocumented (will fix),
      and is useful to use /proc/kcore instead of vmlinux and then get what is
      really in place, not what the kernel starts with, before alternatives,
      ftrace .text patching, etc, see the differences:
      
        # perf annotate --stdio2 _raw_spin_lock_irqsave
        _raw_spin_lock_irqsave() /lib/modules/4.16.0-rc4/build/vmlinux
        Event: anon group { cycles, instructions }
      
          0.00   3.17      → callq  __fentry__
          0.00   7.94        push   %rbx
          7.69  36.51      → callq  __page_file_index
                             mov    %rax,%rbx
          7.69   3.17      → callq  *ffffffff82225cd0
                             xor    %eax,%eax
                             mov    $0x1,%edx
         80.77  49.21        lock   cmpxchg %edx,(%rdi)
                             test   %eax,%eax
                           ↓ jne    2b
          3.85   0.00        mov    %rbx,%rax
                             pop    %rbx
                           ← retq
                       2b:   mov    %eax,%esi
                           → callq  queued_spin_lock_slowpath
                             mov    %rbx,%rax
                             pop    %rbx
                           ← retq
        [root@jouet ~]# perf annotate --ignore-vmlinux --stdio2 _raw_spin_lock_irqsave
        _raw_spin_lock_irqsave() /proc/kcore
        Event: anon group { cycles, instructions }
      
          0.00   3.17        nop
          0.00   7.94        push   %rbx
          0.00  23.81        pushfq
          7.69  12.70        pop    %rax
                             nop
                             mov    %rax,%rbx
          7.69   3.17        cli
                             nop
                             xor    %eax,%eax
                             mov    $0x1,%edx
         80.77  49.21        lock   cmpxchg %edx,(%rdi)
                             test   %eax,%eax
                           ↓ jne    2b
          3.85   0.00        mov    %rbx,%rax
                             pop    %rbx
                           ← retq
                       2b:   mov    %eax,%esi
                           → callq  *ffffffff820e96b0
                             mov    %rbx,%rax
                             pop    %rbx
                           ← retq
        #
      
      Diff of the output of those commands:
      
        # perf annotate --stdio2 _raw_spin_lock_irqsave > /tmp/vmlinux
        # perf annotate --ignore-vmlinux --stdio2 _raw_spin_lock_irqsave > /tmp/kcore
        # diff -y /tmp/vmlinux /tmp/kcore
        _raw_spin_lock_irqsave() vmlinux             | _raw_spin_lock_irqsave() /proc/kcore
        Event: anon group { cycles, instructions }     Event: anon group { cycles, instructions }
      
         0.00  3.17  → callq __fentry__              |  0.00  3.17     nop
         0.00  7.94    push  %rbx                       0.00  7.94     push  %rbx
         7.69 36.51  → callq __page_file_index       |  0.00 23.81     pushfq
                                                     >  7.69 12.70     pop   %rax
                                                     >                 nop
                       mov   %rax,%rbx                                 mov   %rax,%rbx
         7.69  3.17  → callq *ffffffff82225cd0       |  7.69  3.17     cli
                                                     >                 nop
                       xor   %eax,%eax                                 xor   %eax,%eax
                       mov   $0x1,%edx                                 mov   $0x1,%edx
        80.77 49.21    lock  cmpxchg %edx,(%rdi)       80.77 49.21     lock  cmpxchg %edx,(%rdi)
                       test  %eax,%eax                                 test  %eax,%eax
                     ↓ jne   2b                                      ↓ jne   2b
         3.85  0.00    mov   %rbx,%rax                  3.85  0.00     mov   %rbx,%rax
                       pop   %rbx                                      pop   %rbx
                     ← retq                                          ← retq
                  2b:  mov   %eax,%esi                            2b:  mov   %eax,%esi
                     → callq queued_spin_lock_slowpath|              → callq *ffffffff820e96b0
                       mov   %rbx,%rax                                 mov   %rbx,%rax
                       pop   %rbx                                      pop   %rbx
                     ← retq                                          ← retq
        #
      
      This should be further streamlined by doing both annotations and
      allowing the TUI to toggle initial/current, and show the patched
      instructions in a slightly different color.
      
      Cc: Adrian Hunter <adrian.hunter@intel.com>
      Cc: Andi Kleen <ak@linux.intel.com>
      Cc: David Ahern <dsahern@gmail.com>
      Cc: Jin Yao <yao.jin@linux.intel.com>
      Cc: Jiri Olsa <jolsa@kernel.org>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Wang Nan <wangnan0@huawei.com>
      Link: https://lkml.kernel.org/n/tip-wz8d269hxkcwaczr0r4rhyjg@git.kernel.orgSigned-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      be316409
    • Arnaldo Carvalho de Melo's avatar
      perf annotate: Introduce the --stdio2 output mode · befd2a38
      Arnaldo Carvalho de Melo authored
      This uses the TUI augmented formatting routines, modulo interactivity.
      
        # perf annotate --ignore-vmlinux --stdio2 _raw_spin_lock_irqsave
        _raw_spin_lock_irqsave() /proc/kcore
        Event: cycles:ppp
      
        Percent
      
                    Disassembly of section load0:
      
                    ffffffff9a8734b0 <load0>:
                      nop
                      push   %rbx
         50.00        pushfq
                      pop    %rax
                      nop
                      mov    %rax,%rbx
                      cli
                      nop
                      xor    %eax,%eax
                      mov    $0x1,%edx
         50.00        lock   cmpxchg %edx,(%rdi)
                      test   %eax,%eax
                    ↓ jne    2b
                      mov    %rbx,%rax
                      pop    %rbx
                    ← retq
                2b:   mov    %eax,%esi
                    → callq  queued_spin_lock_slowpath
                      mov    %rbx,%rax
                      pop    %rbx
                    ← retq
      Tested-by: default avatarJin Yao <yao.jin@linux.intel.com>
      Cc: Adrian Hunter <adrian.hunter@intel.com>
      Cc: Andi Kleen <ak@linux.intel.com>
      Cc: David Ahern <dsahern@gmail.com>
      Cc: Jiri Olsa <jolsa@kernel.org>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Wang Nan <wangnan0@huawei.com>
      Link: https://lkml.kernel.org/n/tip-6cte5o8z84mbivbvqlg14uh1@git.kernel.orgSigned-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      befd2a38
  23. 19 Mar, 2018 1 commit
  24. 16 Mar, 2018 1 commit