Skip to content

[wperf,wperf-driver] WPERF-1271: add multi-core sampling basic functionality

Introduction

Add minimal support for multi-core (software) sampling where we:

  • enable n cores
  • to sample at the same time
  • for the same events and sampling frequencies (specified with -e). Till now users calling wperf with sample were able to sample only on one core with -c <N>. Now users can specify range of cores to sample for process software samples.

Example commands users can now issue for many cores (see -c command line option):

>wperf sample -c 1
>wperf sample -c 0,10,20,30,40
>wperf sample -c 0,60-65

Note: please note that if you skip -c with wperf sample command, WindowsPerf will sample on all cores.

Also, I've updated how we detect image name, so now when users specify:

--pe_file cpython\PCbuild\arm64\python_d.exe

wperf will deduce from this path image name python_d.exe - as we assume that image name is last element of potential path in PE file name. Please also note that we will only splice path by \ at this time. If it is unclear which image name is takes, users should use -v and see assumed image name, or override image name with --image_name <name>.

In this patch:

  • [wperf]: README nits
  • [wperf]: update multi-core sampling README entry
  • [wperf]: update usage and related docs
  • [wperf]: update --image_name usage
  • [wperf,wperf-driver]: add multicore support to PMU_CTL_SAMPLE_START
  • [wperf] typo
  • [wperf]: add image deduction based on last path element
  • [wperf-test]: add unit tests for ExtractFileNameFromPath
  • [wperf]: add simple function to extract file name from path
  • [wperf]: check for only one core specified with 'record' command
  • [wperf-driver]: fix incorrect sample generated and sample dropped values
  • [wperf]: add verbose pid number for sampling
  • [wperf]: remove restrivtion for 1 sampling core
  • [wperf,wperf-driver]: add multicore support to PMU_CTL_SAMPLE_STOP
  • wperf: fetch sampling data for all cores
  • wperf: receive samples from all cores
  • wperf-driver: add multicore support for PMU_CTL_SAMPLE_SET_SRC ioctrl
  • wperf: add mutliple cores to sampling PMU_CTL_SAMPLE_SET_SRC ioctrl

Testing Documentation

Changes to README are rendered here:

Testing

  1. Execute CPython executable cpython\PCbuild\arm64\python_d.exe -c 10**10**100 without pinning it to any of the cores.
>wperf sample -e ld_spec:10000 --pe_file cpython\PCbuild\arm64\python_d.exe
base address of 'python_d.exe': 0x7ff7fc5d1288, runtime delta: 0x7ff6bc5d0000
sampling ............................Ctrl-C received, quit counting... done!
Analyzing 256,000 sample(s)................... of which 30 are resolved samples
======================== sample source: ld_spec, top 50 hot functions ========================
        overhead  count  symbol
        ========  =====  ======
           84.57  10745  unknown
           13.02   1654  x_mul:python313_d.dll
            0.79    101  v_isub:python313_d.dll
            0.61     78  PyErr_CheckSignals:python313_d.dll
            0.38     48  v_iadd:python313_d.dll
            0.13     17  x_add:python313_d.dll
            0.08     10  _PyMem_DebugCheckAddress:python313_d.dll
            0.07      9  read_size_t:python313_d.dll
            0.06      8  _PyErr_CheckSignalsTstate:python313_d.dll
            0.05      6  _Py_ThreadCanHandleSignals:python313_d.dll
            0.03      4  _PyLong_DigitCount:python313_d.dll
            0.02      3  PyThread_get_thread_ident_ex:python313_d.dll
            0.02      2  _PyMem_DebugRawFree:python313_d.dll
            0.02      2  PyInterpreterState_Get:python313_d.dll
            0.02      2  _Py_DECREF_SPECIALIZED:python313_d.dll
            0.02      2  PyThread_tss_get:python313_d.dll
            0.02      2  fill_mem_debug:python313_d.dll
            0.01      1  arena_map_is_used:python313_d.dll
            0.01      1  new_reference:python313_d.dll
            0.01      1  _PyInterpreterState_GET:python313_d.dll
            0.01      1  kmul_split:python313_d.dll
            0.01      1  PyThread_tss_is_created:python313_d.dll
            0.01      1  arena_map_get:python313_d.dll
            0.01      1  _Py_IncRefTotal:python313_d.dll
            0.01      1  PyObject_Malloc:python313_d.dll
            0.01      1  long_normalize:python313_d.dll
            0.01      1  k_mul:python313_d.dll
            0.01      1  _PyMem_DebugRawAlloc:python313_d.dll
            0.01      1  write_size_t:python313_d.dll
            0.01      1  pymalloc_free:python313_d.dll
          100.00% 12706  top 30 in total

              27.934 seconds time elapsed
  1. Execute CPython executable cpython\PCbuild\arm64\python_d.exe -c 10**10**100 on core 65 and sample in the range or all cores. See below example sampling.

All cores

>wperf sample -e ld_spec:10000 --pe_file cpython\PCbuild\arm64\python_d.exe
base address of 'python_d.exe': 0x7ff7fc5d1288, runtime delta: 0x7ff6bc5d0000
sampling ......Ctrl-C received, quit counting... done!
Analyzing 30,720 sample(s)..... of which 11 are resolved samples
======================== sample source: ld_spec, top 50 hot functions ========================
        overhead  count  symbol
        ========  =====  ======
           95.62   5480  unknown
            3.07    176  x_mul:python313_d.dll
            0.52     30  v_isub:python313_d.dll
            0.45     26  x_add:python313_d.dll
            0.19     11  PyErr_CheckSignals:python313_d.dll
            0.05      3  v_iadd:python313_d.dll
            0.02      1  _PyMem_DebugRawAlloc:python313_d.dll
            0.02      1  read_size_t:python313_d.dll
            0.02      1  _PyMem_DebugMalloc:python313_d.dll
            0.02      1  _Py_ThreadCanHandleSignals:python313_d.dll
            0.02      1  _PyErr_CheckSignalsTstate:python313_d.dll
          100.00%  5731  top 11 in total

                3.76 seconds time elapsed

On range of cores

>wperf sample -c 60-65 -e ld_spec:10000 --pe_file cpython\PCbuild\arm64\python_d.exe
base address of 'python_d.exe': 0x7ff7fc5d1288, runtime delta: 0x7ff6bc5d0000
sampling .....e....e..e.e....Ctrl-C received, quit counting... done!
Analyzing 13,056 sample(s).... of which 24 are resolved samples
======================== sample source: ld_spec, top 50 hot functions ========================
        overhead  count  symbol
        ========  =====  ======
           69.32   1157  x_mul:python313_d.dll
           17.02    284  unknown
            4.19     70  v_isub:python313_d.dll
            3.71     62  PyErr_CheckSignals:python313_d.dll
            2.40     40  v_iadd:python313_d.dll
            0.72     12  x_add:python313_d.dll
            0.42      7  _PyErr_CheckSignalsTstate:python313_d.dll
            0.36      6  _PyMem_DebugCheckAddress:python313_d.dll
            0.24      4  _PyMem_DebugRawAlloc:python313_d.dll
            0.18      3  write_size_t:python313_d.dll
            0.18      3  PyGILState_Check:python313_d.dll
            0.18      3  _Py_ThreadCanHandleSignals:python313_d.dll
            0.12      2  read_size_t:python313_d.dll
            0.12      2  long_normalize:python313_d.dll
            0.12      2  fill_mem_debug:python313_d.dll
            0.12      2  PyThread_get_thread_ident_ex:python313_d.dll
            0.12      2  _Py_DECREF_SPECIALIZED:python313_d.dll
            0.12      2  _PyLong_DigitCount:python313_d.dll
            0.06      1  _Py_DECREF_INT:python313_d.dll
            0.06      1  arena_map_is_used:python313_d.dll
            0.06      1  _Py_IncRefTotal:python313_d.dll
            0.06      1  PyInterpreterState_Get:python313_d.dll
            0.06      1  k_mul:python313_d.dll
            0.06      1  tstate_tss_get:python313_d.dll
          100.00%  1669  top 24 in total

              18.461 seconds time elapsed

Cores outside of actual core with CPython running

>wperf sample -c 10-20 -e ld_spec:10000 --pe_file cpython\PCbuild\arm64\python_d.exe
base address of 'python_d.exe': 0x7ff7fc5d1288, runtime delta: 0x7ff6bc5d0000
sampling ........Ctrl-C received, quit counting... done!
Analyzing 7,040 sample(s).... of which 1 are resolved samples
======================== sample source: ld_spec, top 50 hot functions ========================
        overhead  count  symbol
        ========  =====  ======
          100.00   1096  unknown
          100.00%  1096  top 1 in total

               6.063 seconds time elapsed

Testing (regression)

>pytest
===================================================== test session starts ======================================================
platform win32 -- Python 3.12.3, pytest-8.2.0, pluggy-1.5.0
configfile: pytest.ini
collected 1343 items / 9 skipped

wperf_cli_common_test.py .............                                                                                    [  0%]
wperf_cli_config_test.py .....ssssss.ss..                                                                                 [  2%]
wperf_cli_cpython_bench_test.py .s                                                                                        [  2%]
wperf_cli_cpython_dep_record_spe_cli_test.py ............................................................................ [  7%]
.......................                                                                                                   [  9%]
wperf_cli_cpython_dep_record_spe_test.py .........................                                                        [ 11%]
wperf_cli_cpython_dep_record_test.py ..................                                                                   [ 12%]
wperf_cli_cpython_dep_sample_test.py .                                                                                    [ 12%]
wperf_cli_custom_delim_test.py .............................                                                              [ 15%]
wperf_cli_dmc_test.py .                                                                                                   [ 15%]
wperf_cli_dmc_value_test.py .                                                                                             [ 15%]
wperf_cli_extra_events_test.py ....                                                                                       [ 15%]
wperf_cli_hammer_core_test.py ..................                                                                          [ 16%]
wperf_cli_help_test.py ..                                                                                                 [ 17%]
wperf_cli_info_str_test.py .                                                                                              [ 17%]
wperf_cli_json_validator_test.py .................                                                                        [ 18%]
wperf_cli_list_test.py .........                                                                                          [ 19%]
wperf_cli_lock_test.py ..                                                                                                 [ 19%]
wperf_cli_man_test.py ..........................................................................................ss        [ 26%]
wperf_cli_man_ts_test.py ................................................................................................ [ 33%]
..............................................................                                                            [ 37%]
wperf_cli_metrics_test.py .............                                                                                   [ 38%]
wperf_cli_metrics_ts_test.py .............................................................................                [ 44%]
wperf_cli_padding_test.py ............................................................................................... [ 51%]
......................................................................................................................... [ 60%]
......................................................................................................................... [ 69%]
.......                                                                                                                   [ 70%]
wperf_cli_prettytable_test.py .....                                                                                       [ 70%]
wperf_cli_record_test.py ................s                                                                                [ 71%]
wperf_cli_sample_test.py ..........                                                                                       [ 72%]
wperf_cli_stat_multicore_test.py ....                                                                                     [ 72%]
wperf_cli_stat_test.py .........................................................................                          [ 78%]
wperf_cli_stat_value_test.py ............................................................................................ [ 85%]
........................................................................................                                  [ 91%]
wperf_cli_test_test.py ...........                                                                                        [ 92%]
wperf_cli_timeline_test.py ..............................................................                                 [ 97%]
wperf_cli_ustress_bench_test.py ......                                                                                    [ 97%]
wperf_cli_ustress_dep_wperf_lib_timeline_test.py .                                                                        [ 97%]
wperf_cli_ustress_dep_wperf_test.py ...........                                                                           [ 98%]
wperf_cli_ustress_timeline_test.py ..................                                                                     [ 99%]
wperf_cli_xperf_test.py s                                                                                                 [ 99%]
wperf_lib_app_test.py .                                                                                                   [ 99%]
wperf_lib_c_compat_test.py .                                                                                              [100%]
================================================ WindowsPerf Test Configuration ================================================
OS: Windows-11-10.0.26100-SP0, ARM64
CPU: 80 x ARMv8 (64-bit) Family 8 Model D0C Revision 301, Ampere(R)
Python: 3.12.3 (tags/v3.12.3:f6650f9, Apr  9 2024, 14:18:48) [MSC v.1938 64 bit (ARM64)]
Time: 23/09/2025, 16:23:50
wperf: 5.4.0.53e51d5e+etw-app+spe+cmn
wperf-driver: 5.4.0.53e51d5e+trace+spe+cmn
Configuration: --use=None
Edited by Przemyslaw Wirkus

Merge request reports

Loading