Cuda new features [not ready to merge]
Created by: Jokeren
This branch will contain a list of new activities and enhancement based on new features introduced in CUDA 11.2. I aim to complete all these features during the summer and verify functionalities after returning to Rice. Hopefully, we can merge this branch into the master some time this Fall.
Features
-
Trace and profile (simple) unified memory activities. https://github.com/HPCToolkit/hpctoolkit/issues/396 -
Trace and profile memory allocation, free, and set activities. https://github.com/HPCToolkit/hpctoolkit/issues/393 -
New PC sampling, which is currently in another branch. -
Use CUPTI built-in crc instead of calculating MD5.
Improvement
-
Reduce runtime overhead by specifying GPU APIs before execution instead of using switch/case to determine the API kind. -
Invoke nvdisasm using multiple CPU threads. -
gpu=nvidia,pc
only profiles pc sampling activities. https://github.com/HPCToolkit/hpctoolkit/issues/381 -
Add macros to separate activities that are only supported in the newer versions of CUDA. https://github.com/HPCToolkit/hpctoolkit/issues/394 -
Output control knob info. https://github.com/HPCToolkit/hpctoolkit/issues/395 -
hpcrun should error if one attempts to use CUPTI but CUPTI is disabled. https://github.com/HPCToolkit/hpctoolkit/issues/342
Edited by Jonathon Anderson