Add NVTX/ROCTX/ITT/XPTI profiling annotation to wallcycle counters
Summary
Add the ability to build GROMACS with profiler instrumentation to annotate different parts of the code.
I suggest to add the calls to the existing Wall cycle counter functions, to be able to avoid duplicating the code.
This is a low-priority task, so no commitment to any milestone, but I consider working on it in the near term.
Use cases
-
Improve the interpretability of profiler statistics and visual traces.
-
Make it easier to correlate internal GROMACS performance report and the reports of external tools.
Impact
By default, GROMACS is built without instrumentation, so the normal users will not be impacted.
For developers, it will allow easier profiling and performance optimizations. Especially relevant when doing application-level profiling or profiling the underlying framework (hipSYCL and DPC++, which have non-trivial internal scheduling logic).
Detailed description
At configure time, the user should be able to select the instrumentation framework to use. This might require extra CMake scripting to detect location of required headers/libraries.
Add calls to the intrumentation functions at wallcycle_start
, wallcycle_stop
, wallcycle_increment_event_count
, wallcycle_sub_start
, wallcycle_sub_stop
. *_start_nocount
functions call *_start
inside, so they don't require any changes.
At the moment, ROC-TX seems to be the most important. But from brief examination, all the frameworks follow similar logic, so some easy abstraction layer could likely be created to support any of them.