Skip to content

[New CUPTI] Main

Jonathon Anderson requested to merge new-cupti-rebase into develop

Created by: Jokeren

Test

-ck HPCRUN_CUDA_NEW_CUPTI=TRUE should be applied to enable fast unwinding and range profiling.

Regression

PeleC/PMF stop_time=100, max_time=200, 6.3s

  • hpcrun -e gpu=nvidia 9.0s
  • hpcrun -e gpu=nvidia,pc 104s

New CUPTI

PeleC

The following commands should generate databases with the same quality as if without HPCRUN_CUDA_NEW_CUPTI=TRUE, except that coarse-grained metrics (e.g., GXCOPY) are not available in the fine-grained profiling mode (i.e., gpu=nvidia,pc).

  • hpcrun -e gpu=nvidia -ck HPCRUN_CUDA_NEW_CUPTI=TRUE 8.5s
  • hpcrun -e gpu=nvidia,pc -ck HPCRUN_CUDA_NEW_CUPTI=TRUE 86s

The following commands use range profiling or fast unwinding to reduce overhead.

  • hpcrun -e gpu=nvidia -ck HPCRUN_CUDA_NEW_CUPTI=TRUE -ck HPCRUN_CUDA_FAST_UNWIND=TRUE 7.5s
  • hpcrun -e gpu=nvidia,pc -ck HPCRUN_CUDA_NEW_CUPTI=TRUE -ck HPCRUN_CUDA_RANGE_MODE=CONTEXT_SENSITIVE 44s
  • hpcrun -e gpu=nvidia,pc -ck HPCRUN_CUDA_NEW_CUPTI=TRUE -ck HPCRUN_CUDA_RANGE_MODE=TRIE 44s

For unknown reasons, -ck CUDA_SYNC_YIELD=TRUE does not have any effect on PeleC. We had to modify PeleC source code to change the synchronization policy.

Limitations

  • Coarse-grained metrics and fine-grained metrics are profiled separately. Since kernel metrics such as the number of register usage are no longer available in the fine-grained profiling mode, we cannot derive SM efficiency and occupancy metrics.
  • Profiling an application without any GPU kernel will hang using the range profiling model.
  • [CUPTI Bug] CUPTI PC Sampling API slows down cudaDeviceSynchronize().
  • [CUPTI Bug] CUPTI PC Sampling records overflow after profiling a range with many big kernels.
  • [CUPTI Bug] CUPTI incurs high profiling overhead (i.e., 1.5x) for Laghos.
  • [CUPTI Limitation] CUPTI PC Sampling API does not support profiling multiple contexts using the continuous (i.e., range) mode.
Edited by Jonathon Anderson

Merge request reports