Skip to content

Better cuda error message

Wei Wu requested to merge cuda_errmsg into master

Currently, when a GPU task/copy/fill crashes, realm only reports the error message of cuEventQuery, like CUDA error reported on GPU 0: an illegal memory access was encountered (CUDA_ERROR_ILLEGAL_ADDRESS). However, when multiple GPU tasks are running currently, it is hard to figure out which task triggers the error. With this PR, we will add another error message telling which task/copy triggers the error, an example is:

[0 - 7fe00c193800]    4.971976 {6}{gpu}: CUDA error reported on GPU 0: an illegal memory access was encountered (CUDA_ERROR_ILLEGAL_ADDRESS)
[0 - 7fe00c193800]    4.972020 {6}{gpu}: GPU task ID=5, event ID=8000000000200005 failed.

Merge request reports