hpcrun: CUPTI_ERROR_NOT_SUPPORTED with CUDA 12.3
When using CUDA 12.3+, monitoring a simple application with hpcrun -e gpu=nvidia causes a crash with the error CUPTI_ERROR_NOT_SUPPORTED.
Setup
- CUDA 12.3
$ hpcrun --version
To Reproduce
Run a simple application under:
$ hpcrun -e gpu=nvidia ./vecadd
Logs
CUPTI result error, function cuptiActivitySetAttribute failed with error CUPTI_ERROR_NOT_SUPPORTED
Occurs every time.
Stack trace in GDB:
Breakpoint 1, 0x00007fe9f4217f54 in cuptiActivitySetAttribute () from /usr/local/cuda-12.3//lib64/libcupti.so
(gdb) where
#0 0x00007fe9f4217f54 in cuptiActivitySetAttribute () from /usr/local/cuda-12.3//lib64/libcupti.so
#1 0x00007fe9fab53dc3 in cupti_device_buffer_config (buf_size=8388608, sem_size=65536)
at ../src/tool/hpcrun/gpu/nvidia/cupti-api.c:1234
#2 0x00007fe9fab4f583 in process_event_list (self=0x7fe9fabc22a0 <_nvidia_gpu_obj>, lush_metrics=0)
at ../src/tool/hpcrun/sample-sources/nvidia.c:430
#3 0x00007fe9fab33cb9 in hpcrun_all_sources_process_event_list (lush_metrics=0) at ../src/tool/hpcrun/sample_sources_all.c:262
#4 0x00007fe9fab25d52 in hpcrun_init_internal (is_child=false) at ../src/tool/hpcrun/main.c:592
#5 0x00007fe9fab269ef in hpcrun_prepare_measurement_subsystem (is_child=false) at ../src/tool/hpcrun/main.c:1123
#6 0x00007fe9fab265de in foilbase_monitor_start_main_init () at ../src/tool/hpcrun/main.c:966
#7 0x00007fe9fab492c0 in monitor_start_main_init () at ../src/tool/hpcrun/foil/monitor-preload.c:109
#8 0x00007fe9fb4a427d in __libc_start_main (main=0x404c40 <main(int, signed char**, signed char**)>, argc=1, argv=0x7fff3945e438,
init=0x47f570 <__libc_csu_init>, fini=0x47f5e0 <__libc_csu_fini>, rtld_fini=0x7fe9fb3c6ad0 <_dl_fini>,
stack_end=0x7fff3945e428) at main.c:574
#9 0x0000000000404b3e in _start ()
Root Cause
CUDA 12.3 dropped support for configuring the semaphore pool size, which we configure and fail if the call fails.
Edited by Wileam Y. Phan