Skip to content

Fix GPU Profiling

Mike Bauer requested to merge fixgpuprof into master

This branch changes how the profiler renders GPU tasks and kernels so that it accurately reflects the execution that occurred. For every GPU task there will now be two boxes: one that shows the duration of the CPU execution for the task, and one that shows the execution time of any kernels on the GPU itself. The relationship between these two boxes is established through the already existing "caller" interface that we have for functions, but in this case shows which box is the caller of GPU kernels. This change also ensures that only the second box counts towards the utilization percentage of the GPU so that we are representing how busy the GPU (the important resource) is and not how busy the CPU thread is that is used for launching work on the GPU.

I did not update the Python profiler to reflect these changes because there are no tests in the CI (either long or short versions) that test the profiler on the GPU and we should observe no issues when diff-ing the profiles on CPU only runs. I don't think it's worth the additional effort to update the Python profiler to have this functionality.

Merge request reports