Annotate for sampling model
This is more of a place to discuss the functionality then a final merge request as I believe part of the output format and other things should be discussed a bit further. This first step is a reasonable simple change that queries for function line number stores it in a struct and compares the PCs received from sampling showing it to the user.
Here is a sample output for main.exe
PS C:\Users\everton\source\repos\windowsperf\wperf-scripts> wperf sample -e vfp_spec:10000 -pe_file main.exe -pdb_file main.pdb -image_name main.exe -c 1 -annotate
base address of 'main.exe': 0x7ff6d2b85e78, runtime delta: 0x7ff592b80000
sampling ....eeeeeeeeeeeeeeeeee.eeeeeeeeeeeeeeeeeee.eeeeCtrl-C received, quit counting...e done!
======================== sample source: vfp_spec, top 50 hot functions ========================
simd_hot
Source file Line number Hits
=========== =========== ====
C:\Users\przemek\Desktop\wperf\merge-retquest\94\lib.c 23 383
df_hot
Source file Line number Hits
=========== =========== ====
C:\Users\przemek\Desktop\wperf\merge-retquest\94\lib.c 8 1
overhead count symbol
======== ===== ======
99.74 383 simd_hot
0.26 1 df_hot
100.00% 384 top 2 in total
and for the googleplex calculation with python_d.exe
PS C:\Users\everton\source\repos\windowsperf\wperf-scripts> wperf sample -e ld_spec:100000 -pe_file arm64\python_d.exe -pdb_file arm64\python_d.pdb -image_name python_d.exe -c 1 -annotate
base address of 'python_d.exe': 0x7ff664ff1270, runtime delta: 0x7ff524ff0000
sampling ....eeeeee.eeCtrl-C received, quit counting...e done!
======================== sample source: ld_spec, top 50 hot functions ========================
x_mul:python313_d.dll
Source file Line number Hits
=========== =========== ====
C:\Users\evert\source\repos\cpython\Objects\longobject.c 3559 98
C:\Users\evert\source\repos\cpython\Objects\longobject.c 3560 48
C:\Users\evert\source\repos\cpython\Objects\longobject.c 3562 22
C:\Users\evert\source\repos\cpython\Objects\longobject.c 3558 17
C:\Users\evert\source\repos\cpython\Objects\longobject.c 3561 15
C:\Users\evert\source\repos\cpython\Objects\longobject.c 3563 6
C:\Users\evert\source\repos\cpython\Objects\longobject.c 3542 2
C:\Users\evert\source\repos\cpython\Objects\longobject.c 3540 1
C:\Users\evert\source\repos\cpython\Objects\longobject.c 3557 1
C:\Users\evert\source\repos\cpython\Objects\longobject.c 3571 1
v_isub:python313_d.dll
Source file Line number Hits
=========== =========== ====
C:\Users\evert\source\repos\cpython\Objects\longobject.c 1546 8
C:\Users\evert\source\repos\cpython\Objects\longobject.c 1545 2
C:\Users\evert\source\repos\cpython\Objects\longobject.c 1547 1
C:\Users\evert\source\repos\cpython\Objects\longobject.c 1548 1
C:\Users\evert\source\repos\cpython\Objects\longobject.c 1549 1
C:\Users\evert\source\repos\cpython\Objects\longobject.c 1550 1
v_iadd:python313_d.dll
Source file Line number Hits
=========== =========== ====
C:\Users\evert\source\repos\cpython\Objects\longobject.c 1520 3
C:\Users\evert\source\repos\cpython\Objects\longobject.c 1524 2
C:\Users\evert\source\repos\cpython\Objects\longobject.c 1523 1
PyErr_CheckSignals:python313_d.dll
Source file Line number Hits
=========== =========== ====
C:\Users\evert\source\repos\cpython\Modules\signalmodule.c 1760 2
C:\Users\evert\source\repos\cpython\Modules\signalmodule.c 1778 2
C:\Users\evert\source\repos\cpython\Modules\signalmodule.c 1759 1
C:\Users\evert\source\repos\cpython\Modules\signalmodule.c 1769 1
_Py_ThreadCanHandleSignals:python313_d.dll
Source file Line number Hits
=========== =========== ====
C:\Users\evert\source\repos\cpython\Include\internal\pycore_pystate.h 59 3
_Py_atomic_load_32bit_impl:python313_d.dll
Source file Line number Hits
=========== =========== ====
C:\Users\evert\source\repos\cpython\Include\internal\pycore_atomic.h 470 1
C:\Users\evert\source\repos\cpython\Include\internal\pycore_atomic.h 486 1
C:\Users\evert\source\repos\cpython\Include\internal\pycore_atomic.h 492 1
x_add:python313_d.dll
Source file Line number Hits
=========== =========== ====
C:\Users\evert\source\repos\cpython\Objects\longobject.c 3370 1
PyThread_tss_is_created:python313_d.dll
Source file Line number Hits
=========== =========== ====
C:\Users\evert\source\repos\cpython\Python\thread.c 104 1
long_normalize:python313_d.dll
Source file Line number Hits
=========== =========== ====
C:\Users\evert\source\repos\cpython\Objects\longobject.c 117 1
_PyMem_DebugFree:python313_d.dll
Source file Line number Hits
=========== =========== ====
C:\Users\evert\source\repos\cpython\Objects\obmalloc.c 2169 1
PyGILState_Check:python313_d.dll
Source file Line number Hits
=========== =========== ====
C:\Users\evert\source\repos\cpython\Python\pystate.c 2181 1
_PyMem_DebugRawMalloc:python313_d.dll
Source file Line number Hits
=========== =========== ====
C:\Users\evert\source\repos\cpython\Objects\obmalloc.c 2004 1
PyThread_get_thread_ident:python313_d.dll
Source file Line number Hits
=========== =========== ====
C:\Users\evert\source\repos\cpython\Python\thread_nt.h 227 1
overhead count symbol
======== ===== ======
82.42 211 x_mul:python313_d.dll
5.47 14 v_isub:python313_d.dll
2.34 6 unknown
2.34 6 v_iadd:python313_d.dll
2.34 6 PyErr_CheckSignals:python313_d.dll
1.17 3 _Py_ThreadCanHandleSignals:python313_d.dll
1.17 3 _Py_atomic_load_32bit_impl:python313_d.dll
0.39 1 x_add:python313_d.dll
0.39 1 PyThread_tss_is_created:python313_d.dll
0.39 1 long_normalize:python313_d.dll
0.39 1 _PyMem_DebugFree:python313_d.dll
0.39 1 PyGILState_Check:python313_d.dll
0.39 1 _PyMem_DebugRawMalloc:python313_d.dll
0.39 1 PyThread_get_thread_ident:python313_d.dll
100.00% 256 top 14 in total
I created another user line option called -annotate
.
I am not a big fan of this solution, the main reason is that it is starting to complicate the output and execution from sampling
too much, even the JSON will start to look a bit messy as this feature progresses. Since we don't know what PCs are going to be called we have to either keep the DIA Session open or store the information we might need beforehand. This would be greatly simplified if we broke this into two steps.
My proposed solution is a bit more complex so this is the reason I am posting this MR as a compromised alternative.
I believe the proper steps would be to run the sample
command, which would store the results like PCs, memory entry points and so forth on a binary file. The user could then, if he wishes, run a analyze
command or something like it to start extracting more information like this one MR here, exploring source code, disassembly and so on. This can be done without forcing the user to rerun his workloads as the result would be already stored.