[New CUPTI] Main (!519) · Merge requests · HPCToolkit / HPCToolkit

Jonathon Anderson requested to merge new-cupti-rebase into develop Feb 10, 2022

Created by: Jokeren

Test

-ck HPCRUN_CUDA_NEW_CUPTI=TRUE should be applied to enable fast unwinding and range profiling.

Regression

PeleC/PMF stop_time=100, max_time=200, 6.3s

hpcrun -e gpu=nvidia 9.0s
hpcrun -e gpu=nvidia,pc 104s

New CUPTI

PeleC

The following commands should generate databases with the same quality as if without HPCRUN_CUDA_NEW_CUPTI=TRUE, except that coarse-grained metrics (e.g., GXCOPY) are not available in the fine-grained profiling mode (i.e., gpu=nvidia,pc).

hpcrun -e gpu=nvidia -ck HPCRUN_CUDA_NEW_CUPTI=TRUE 8.5s
hpcrun -e gpu=nvidia,pc -ck HPCRUN_CUDA_NEW_CUPTI=TRUE 86s

The following commands use range profiling or fast unwinding to reduce overhead.

hpcrun -e gpu=nvidia -ck HPCRUN_CUDA_NEW_CUPTI=TRUE -ck HPCRUN_CUDA_FAST_UNWIND=TRUE 7.5s
hpcrun -e gpu=nvidia,pc -ck HPCRUN_CUDA_NEW_CUPTI=TRUE -ck HPCRUN_CUDA_RANGE_MODE=CONTEXT_SENSITIVE 44s
hpcrun -e gpu=nvidia,pc -ck HPCRUN_CUDA_NEW_CUPTI=TRUE -ck HPCRUN_CUDA_RANGE_MODE=TRIE 44s

For unknown reasons, -ck CUDA_SYNC_YIELD=TRUE does not have any effect on PeleC. We had to modify PeleC source code to change the synchronization policy.

Limitations

Coarse-grained metrics and fine-grained metrics are profiled separately. Since kernel metrics such as the number of register usage are no longer available in the fine-grained profiling mode, we cannot derive SM efficiency and occupancy metrics.
Profiling an application without any GPU kernel will hang using the range profiling model.
[CUPTI Bug] CUPTI PC Sampling API slows down cudaDeviceSynchronize().
[CUPTI Bug] CUPTI PC Sampling records overflow after profiling a range with many big kernels.
[CUPTI Bug] CUPTI incurs high profiling overhead (i.e., 1.5x) for Laghos.
[CUPTI Limitation] CUPTI PC Sampling API does not support profiling multiple contexts using the continuous (i.e., range) mode.

Edited Oct 19, 2023 by Jonathon Anderson

[New CUPTI] Main

Test

Regression

New CUPTI

Limitations

Merge request reports