Better cuda error message
Currently, when a GPU task/copy/fill crashes, realm only reports the error message of cuEventQuery
, like CUDA error reported on GPU 0: an illegal memory access was encountered (CUDA_ERROR_ILLEGAL_ADDRESS)
.
However, when multiple GPU tasks are running currently, it is hard to figure out which task triggers the error. With this PR, we will add another error message telling which task/copy triggers the error, an example is:
[0 - 7fe00c193800] 4.971976 {6}{gpu}: CUDA error reported on GPU 0: an illegal memory access was encountered (CUDA_ERROR_ILLEGAL_ADDRESS)
[0 - 7fe00c193800] 4.972020 {6}{gpu}: GPU task ID=5, event ID=8000000000200005 failed.