• Andi Kleen's avatar
    perf/x86/intel: Add a separate Arch Perfmon v4 PMI handler · af3bdb99
    Andi Kleen authored
    Implements counter freezing for Arch Perfmon v4 (Skylake and
    newer). This allows to speed up the PMI handler by avoiding
    unnecessary MSR writes and make it more accurate.
    The Arch Perfmon v4 PMI handler is substantially different than
    the older PMI handler.
    Differences to the old handler:
    - It relies on counter freezing, which eliminates several MSR
      writes from the PMI handler and lowers the overhead significantly.
      It makes the PMI handler more accurate, as all counters get
      frozen atomically as soon as any counter overflows. So there is
      much less counting of the PMI handler itself.
      With the freezing we don't need to disable or enable counters or
      PEBS. Only BTS which does not support auto-freezing still needs to
      be explicitly managed.
    - The PMU acking is done at the end, not the beginning.
      This makes it possible to avoid manual enabling/disabling
      of the PMU, instead we just rely on the freezing/acking.
    - The APIC is acked before reenabling the PMU, which avoids
      problems with LBRs occasionally not getting unfreezed on Skylake.
    - Looping is only needed to workaround a corner case which several PMIs
      are very close to each other. For common cases, the counters are freezed
      during PMI handler. It doesn't need to do re-check.
    This patch:
    - Adds code to enable v4 counter freezing
    - Fork <=v3 and >=v4 PMI handlers into separate functions.
    - Add kernel parameter to disable counter freezing. It took some time to
      debug counter freezing, so in case there are new problems we added an
      option to turn it off. Would not expect this to be used until there
      are new bugs.
    - Only for big core. The patch for small core will be posted later
    When profiling a kernel build on Kabylake with different perf options,
    measuring the length of all NMI handlers using the nmi handler
    trace point:
    V3 is without counter freezing.
    V4 is with counter freezing.
    The value is the average cost of the PMI handler.
    (lower is better)
    perf options    `           V3(ns) V4(ns)  delta
    -c 100000                   1088   894     -18%
    -g -c 100000                1862   1646    -12%
    --call-graph lbr -c 100000  3649   3367    -8%
    --c.g. dwarf -c 100000      2248   1982    -12%
    Signed-off-by: 's avatarAndi Kleen <ak@linux.intel.com>
    Signed-off-by: 's avatarKan Liang <kan.liang@linux.intel.com>
    Signed-off-by: 's avatarPeter Zijlstra (Intel) <peterz@infradead.org>
    Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
    Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
    Cc: Jiri Olsa <jolsa@redhat.com>
    Cc: Linus Torvalds <torvalds@linux-foundation.org>
    Cc: Peter Zijlstra <peterz@infradead.org>
    Cc: Stephane Eranian <eranian@google.com>
    Cc: Thomas Gleixner <tglx@linutronix.de>
    Cc: Vince Weaver <vincent.weaver@maine.edu>
    Cc: acme@kernel.org
    Link: http://lkml.kernel.org/r/1533712328-2834-2-git-send-email-kan.liang@linux.intel.comSigned-off-by: 's avatarIngo Molnar <mingo@kernel.org>
perf_event.h 27.1 KB