1. 22 May, 2018 1 commit
  2. 21 May, 2018 1 commit
  3. 06 Mar, 2018 2 commits
  4. 02 Jan, 2018 1 commit
    • Suzuki K Poulose's avatar
      perf: ARM DynamIQ Shared Unit PMU support · 7520fa99
      Suzuki K Poulose authored
      Add support for the Cluster PMU part of the ARM DynamIQ Shared Unit (DSU).
      The DSU integrates one or more cores with an L3 memory system, control
      logic, and external interfaces to form a multicore cluster. The PMU
      allows counting the various events related to L3, SCU etc, along with
      providing a cycle counter.
      
      The PMU can be accessed via system registers, which are common
      to the cores in the same cluster. The PMU registers follow the
      semantics of the ARMv8 PMU, mostly, with the exception that
      the counters record the cluster wide events.
      
      This driver is mostly based on the ARMv8 and CCI PMU drivers.
      The driver only supports ARM64 at the moment. It can be extended
      to support ARM32 by providing register accessors like we do in
      arch/arm64/include/arm_dsu_pmu.h.
      
      Cc: Mark Rutland <[email protected]>
      Cc: Will Deacon <[email protected]>
      Reviewed-by: default avatarJonathan Cameron <[email protected]>
      Reviewed-by: default avatarMark Rutland <[email protected]>
      Signed-off-by: default avatarSuzuki K Poulose <[email protected]>
      Signed-off-by: default avatarWill Deacon <[email protected]>
      7520fa99
  5. 19 Oct, 2017 1 commit
  6. 18 Oct, 2017 1 commit
    • Will Deacon's avatar
      drivers/perf: Add support for ARMv8.2 Statistical Profiling Extension · d5d9696b
      Will Deacon authored
      The ARMv8.2 architecture introduces the optional Statistical Profiling
      Extension (SPE).
      
      SPE can be used to profile a population of operations in the CPU pipeline
      after instruction decode. These are either architected instructions (i.e.
      a dynamic instruction trace) or CPU-specific uops and the choice is fixed
      statically in the hardware and advertised to userspace via caps/. Sampling
      is controlled using a sampling interval, similar to a regular PMU counter,
      but also with an optional random perturbation to avoid falling into patterns
      where you continuously profile the same instruction in a hot loop.
      
      After each operation is decoded, the interval counter is decremented. When
      it hits zero, an operation is chosen for profiling and tracked within the
      pipeline until it retires. Along the way, information such as TLB lookups,
      cache misses, time spent to issue etc is captured in the form of a sample.
      The sample is then filtered according to certain criteria (e.g. load
      latency) that can be specified in the event config (described under
      format/) and, if the sample satisfies the filter, it is written out to
      memory as a record, otherwise it is discarded. Only one operation can
      be sampled at a time.
      
      The in-memory buffer is linear and virtually addressed, raising an
      interrupt when it fills up. The PMU driver handles these interrupts to
      give the appearance of a ring buffer, as expected by the AUX code.
      
      The in-memory trace-like format is self-describing (though not parseable
      in reverse) and written as a series of records, with each record
      corresponding to a sample and consisting of a sequence of packets. These
      packets are defined by the architecture, although some have CPU-specific
      fields for recording information specific to the microarchitecture.
      
      As a simple example, a record generated for a branch instruction may
      consist of the following packets:
      
        0 (Address) : Virtual PC of the branch instruction
        1 (Type)    : Conditional direct branch
        2 (Counter) : Number of cycles taken from Dispatch to Issue
        3 (Address) : Virtual branch target + condition flags
        4 (Counter) : Number of cycles taken from Dispatch to Complete
        5 (Events)  : Mispredicted as not-taken
        6 (END)     : End of record
      
      It is also possible to toggle properties such as timestamp packets in
      each record.
      
      This patch adds support for SPE in the form of a new perf driver.
      
      Cc: Alexander Shishkin <[email protected]>
      Reviewed-by: default avatarMark Rutland <[email protected]>
      Signed-off-by: default avatarWill Deacon <[email protected]>
      d5d9696b
  7. 15 Jun, 2017 1 commit
  8. 11 Apr, 2017 1 commit
  9. 03 Apr, 2017 1 commit
    • Agustin Vega-Frias's avatar
      perf: qcom: Add L3 cache PMU driver · 3071f13d
      Agustin Vega-Frias authored
      This adds a new dynamic PMU to the Perf Events framework to program
      and control the L3 cache PMUs in some Qualcomm Technologies SOCs.
      
      The driver supports a distributed cache architecture where the overall
      cache for a socket is comprised of multiple slices each with its own PMU.
      Access to each individual PMU is provided even though all CPUs share all
      the slices. User space needs to aggregate to individual counts to provide
      a global picture.
      
      The driver exports formatting and event information to sysfs so it can
      be used by the perf user space tools with the syntaxes:
         perf stat -a -e l3cache_0_0/read-miss/
         perf stat -a -e l3cache_0_0/event=0x21/
      Acked-by: default avatarMark Rutland <[email protected]>
      Signed-off-by: default avatarAgustin Vega-Frias <[email protected]>
      [will: fixed sparse issues]
      Signed-off-by: default avatarWill Deacon <[email protected]>
      3071f13d
  10. 08 Feb, 2017 1 commit
  11. 15 Sep, 2016 1 commit
  12. 07 Oct, 2015 1 commit
  13. 31 Jul, 2015 1 commit