[wperf,wperf-driver] WPERF-1271: add multi-core sampling basic functionality
Introduction
Add minimal support for multi-core (software) sampling where we:
- enable
n
cores - to sample at the same time
- for the same events and sampling frequencies (specified with
-e
). Till now users callingwperf
withsample
were able to sample only on one core with-c <N>
. Now users can specify range of cores to sample for process software samples.
Example commands users can now issue for many cores (see -c
command line option):
>wperf sample -c 1
>wperf sample -c 0,10,20,30,40
>wperf sample -c 0,60-65
Note: please note that if you skip
-c
withwperf sample
command, WindowsPerf will sample on all cores.
Also, I've updated how we detect image name, so now when users specify:
--pe_file cpython\PCbuild\arm64\python_d.exe
wperf
will deduce from this path image name python_d.exe
- as we assume that image name is last element of potential path in PE file name. Please also note that we will only splice path by \
at this time. If it is unclear which image name is takes, users should use -v
and see assumed image name, or override image name with --image_name <name>
.
In this patch:
- [wperf]: README nits
- [wperf]: update multi-core sampling README entry
- [wperf]: update usage and related docs
- [wperf]: update --image_name usage
- [wperf,wperf-driver]: add multicore support to PMU_CTL_SAMPLE_START
- [wperf] typo
- [wperf]: add image deduction based on last path element
- [wperf-test]: add unit tests for ExtractFileNameFromPath
- [wperf]: add simple function to extract file name from path
- [wperf]: check for only one core specified with 'record' command
- [wperf-driver]: fix incorrect sample generated and sample dropped values
- [wperf]: add verbose pid number for sampling
- [wperf]: remove restrivtion for 1 sampling core
- [wperf,wperf-driver]: add multicore support to PMU_CTL_SAMPLE_STOP
- wperf: fetch sampling data for all cores
- wperf: receive samples from all cores
- wperf-driver: add multicore support for PMU_CTL_SAMPLE_SET_SRC ioctrl
- wperf: add mutliple cores to sampling PMU_CTL_SAMPLE_SET_SRC ioctrl
Testing Documentation
Changes to README are rendered here:
- https://gitlab.com/PrzemekWirkus/windowsperf/-/blob/devel_multicore_sampling_part_1/wperf/README.md
- render of example-3-multi-core-sampling-of-cpython
Testing
- Execute CPython executable
cpython\PCbuild\arm64\python_d.exe -c 10**10**100
without pinning it to any of the cores.
>wperf sample -e ld_spec:10000 --pe_file cpython\PCbuild\arm64\python_d.exe
base address of 'python_d.exe': 0x7ff7fc5d1288, runtime delta: 0x7ff6bc5d0000
sampling ............................Ctrl-C received, quit counting... done!
Analyzing 256,000 sample(s)................... of which 30 are resolved samples
======================== sample source: ld_spec, top 50 hot functions ========================
overhead count symbol
======== ===== ======
84.57 10745 unknown
13.02 1654 x_mul:python313_d.dll
0.79 101 v_isub:python313_d.dll
0.61 78 PyErr_CheckSignals:python313_d.dll
0.38 48 v_iadd:python313_d.dll
0.13 17 x_add:python313_d.dll
0.08 10 _PyMem_DebugCheckAddress:python313_d.dll
0.07 9 read_size_t:python313_d.dll
0.06 8 _PyErr_CheckSignalsTstate:python313_d.dll
0.05 6 _Py_ThreadCanHandleSignals:python313_d.dll
0.03 4 _PyLong_DigitCount:python313_d.dll
0.02 3 PyThread_get_thread_ident_ex:python313_d.dll
0.02 2 _PyMem_DebugRawFree:python313_d.dll
0.02 2 PyInterpreterState_Get:python313_d.dll
0.02 2 _Py_DECREF_SPECIALIZED:python313_d.dll
0.02 2 PyThread_tss_get:python313_d.dll
0.02 2 fill_mem_debug:python313_d.dll
0.01 1 arena_map_is_used:python313_d.dll
0.01 1 new_reference:python313_d.dll
0.01 1 _PyInterpreterState_GET:python313_d.dll
0.01 1 kmul_split:python313_d.dll
0.01 1 PyThread_tss_is_created:python313_d.dll
0.01 1 arena_map_get:python313_d.dll
0.01 1 _Py_IncRefTotal:python313_d.dll
0.01 1 PyObject_Malloc:python313_d.dll
0.01 1 long_normalize:python313_d.dll
0.01 1 k_mul:python313_d.dll
0.01 1 _PyMem_DebugRawAlloc:python313_d.dll
0.01 1 write_size_t:python313_d.dll
0.01 1 pymalloc_free:python313_d.dll
100.00% 12706 top 30 in total
27.934 seconds time elapsed
- Execute CPython executable
cpython\PCbuild\arm64\python_d.exe -c 10**10**100
on core65
and sample in the range or all cores. See below example sampling.
All cores
>wperf sample -e ld_spec:10000 --pe_file cpython\PCbuild\arm64\python_d.exe
base address of 'python_d.exe': 0x7ff7fc5d1288, runtime delta: 0x7ff6bc5d0000
sampling ......Ctrl-C received, quit counting... done!
Analyzing 30,720 sample(s)..... of which 11 are resolved samples
======================== sample source: ld_spec, top 50 hot functions ========================
overhead count symbol
======== ===== ======
95.62 5480 unknown
3.07 176 x_mul:python313_d.dll
0.52 30 v_isub:python313_d.dll
0.45 26 x_add:python313_d.dll
0.19 11 PyErr_CheckSignals:python313_d.dll
0.05 3 v_iadd:python313_d.dll
0.02 1 _PyMem_DebugRawAlloc:python313_d.dll
0.02 1 read_size_t:python313_d.dll
0.02 1 _PyMem_DebugMalloc:python313_d.dll
0.02 1 _Py_ThreadCanHandleSignals:python313_d.dll
0.02 1 _PyErr_CheckSignalsTstate:python313_d.dll
100.00% 5731 top 11 in total
3.76 seconds time elapsed
On range of cores
>wperf sample -c 60-65 -e ld_spec:10000 --pe_file cpython\PCbuild\arm64\python_d.exe
base address of 'python_d.exe': 0x7ff7fc5d1288, runtime delta: 0x7ff6bc5d0000
sampling .....e....e..e.e....Ctrl-C received, quit counting... done!
Analyzing 13,056 sample(s).... of which 24 are resolved samples
======================== sample source: ld_spec, top 50 hot functions ========================
overhead count symbol
======== ===== ======
69.32 1157 x_mul:python313_d.dll
17.02 284 unknown
4.19 70 v_isub:python313_d.dll
3.71 62 PyErr_CheckSignals:python313_d.dll
2.40 40 v_iadd:python313_d.dll
0.72 12 x_add:python313_d.dll
0.42 7 _PyErr_CheckSignalsTstate:python313_d.dll
0.36 6 _PyMem_DebugCheckAddress:python313_d.dll
0.24 4 _PyMem_DebugRawAlloc:python313_d.dll
0.18 3 write_size_t:python313_d.dll
0.18 3 PyGILState_Check:python313_d.dll
0.18 3 _Py_ThreadCanHandleSignals:python313_d.dll
0.12 2 read_size_t:python313_d.dll
0.12 2 long_normalize:python313_d.dll
0.12 2 fill_mem_debug:python313_d.dll
0.12 2 PyThread_get_thread_ident_ex:python313_d.dll
0.12 2 _Py_DECREF_SPECIALIZED:python313_d.dll
0.12 2 _PyLong_DigitCount:python313_d.dll
0.06 1 _Py_DECREF_INT:python313_d.dll
0.06 1 arena_map_is_used:python313_d.dll
0.06 1 _Py_IncRefTotal:python313_d.dll
0.06 1 PyInterpreterState_Get:python313_d.dll
0.06 1 k_mul:python313_d.dll
0.06 1 tstate_tss_get:python313_d.dll
100.00% 1669 top 24 in total
18.461 seconds time elapsed
Cores outside of actual core with CPython running
>wperf sample -c 10-20 -e ld_spec:10000 --pe_file cpython\PCbuild\arm64\python_d.exe
base address of 'python_d.exe': 0x7ff7fc5d1288, runtime delta: 0x7ff6bc5d0000
sampling ........Ctrl-C received, quit counting... done!
Analyzing 7,040 sample(s).... of which 1 are resolved samples
======================== sample source: ld_spec, top 50 hot functions ========================
overhead count symbol
======== ===== ======
100.00 1096 unknown
100.00% 1096 top 1 in total
6.063 seconds time elapsed
Testing (regression)
>pytest
===================================================== test session starts ======================================================
platform win32 -- Python 3.12.3, pytest-8.2.0, pluggy-1.5.0
configfile: pytest.ini
collected 1343 items / 9 skipped
wperf_cli_common_test.py ............. [ 0%]
wperf_cli_config_test.py .....ssssss.ss.. [ 2%]
wperf_cli_cpython_bench_test.py .s [ 2%]
wperf_cli_cpython_dep_record_spe_cli_test.py ............................................................................ [ 7%]
....................... [ 9%]
wperf_cli_cpython_dep_record_spe_test.py ......................... [ 11%]
wperf_cli_cpython_dep_record_test.py .................. [ 12%]
wperf_cli_cpython_dep_sample_test.py . [ 12%]
wperf_cli_custom_delim_test.py ............................. [ 15%]
wperf_cli_dmc_test.py . [ 15%]
wperf_cli_dmc_value_test.py . [ 15%]
wperf_cli_extra_events_test.py .... [ 15%]
wperf_cli_hammer_core_test.py .................. [ 16%]
wperf_cli_help_test.py .. [ 17%]
wperf_cli_info_str_test.py . [ 17%]
wperf_cli_json_validator_test.py ................. [ 18%]
wperf_cli_list_test.py ......... [ 19%]
wperf_cli_lock_test.py .. [ 19%]
wperf_cli_man_test.py ..........................................................................................ss [ 26%]
wperf_cli_man_ts_test.py ................................................................................................ [ 33%]
.............................................................. [ 37%]
wperf_cli_metrics_test.py ............. [ 38%]
wperf_cli_metrics_ts_test.py ............................................................................. [ 44%]
wperf_cli_padding_test.py ............................................................................................... [ 51%]
......................................................................................................................... [ 60%]
......................................................................................................................... [ 69%]
....... [ 70%]
wperf_cli_prettytable_test.py ..... [ 70%]
wperf_cli_record_test.py ................s [ 71%]
wperf_cli_sample_test.py .......... [ 72%]
wperf_cli_stat_multicore_test.py .... [ 72%]
wperf_cli_stat_test.py ......................................................................... [ 78%]
wperf_cli_stat_value_test.py ............................................................................................ [ 85%]
........................................................................................ [ 91%]
wperf_cli_test_test.py ........... [ 92%]
wperf_cli_timeline_test.py .............................................................. [ 97%]
wperf_cli_ustress_bench_test.py ...... [ 97%]
wperf_cli_ustress_dep_wperf_lib_timeline_test.py . [ 97%]
wperf_cli_ustress_dep_wperf_test.py ........... [ 98%]
wperf_cli_ustress_timeline_test.py .................. [ 99%]
wperf_cli_xperf_test.py s [ 99%]
wperf_lib_app_test.py . [ 99%]
wperf_lib_c_compat_test.py . [100%]
================================================ WindowsPerf Test Configuration ================================================
OS: Windows-11-10.0.26100-SP0, ARM64
CPU: 80 x ARMv8 (64-bit) Family 8 Model D0C Revision 301, Ampere(R)
Python: 3.12.3 (tags/v3.12.3:f6650f9, Apr 9 2024, 14:18:48) [MSC v.1938 64 bit (ARM64)]
Time: 23/09/2025, 16:23:50
wperf: 5.4.0.53e51d5e+etw-app+spe+cmn
wperf-driver: 5.4.0.53e51d5e+trace+spe+cmn
Configuration: --use=None
Edited by Przemyslaw Wirkus