Skip to content

Maxwell tests are failing on GPUs

cuda-memcheck revealed that the Maxwell tests are failing on GPUs, see below:

========= Invalid __global__ write of size 8
=========     at 0x000001e0 in get_selected_points
=========     by thread (7,0,0) in block (0,0,0)
=========     Address 0x7f19edffa038 is out of bounds
=========     Device Frame:get_selected_points (get_selected_points : 0x1e0)
=========     Saved host backtrace up to driver entry point at kernel launch time
=========     Host Frame:/lib/x86_64-linux-gnu/libcuda.so.1 [0x27037a]
=========     Host Frame:/home/tancognn/octopus-code/bin/octopus [0x40b575]
=========     Host Frame:/home/tancognn/octopus-code/bin/octopus [0x2b592]
=========     Host Frame:/home/tancognn/octopus-code/bin/octopus [0x874b88]
=========     Host Frame:/home/tancognn/octopus-code/bin/octopus [0x24c1e7]
=========     Host Frame:/home/tancognn/octopus-code/bin/octopus [0xb15a5]
=========     Host Frame:/home/tancognn/octopus-code/bin/octopus [0xa60fb]
=========     Host Frame:/home/tancognn/octopus-code/bin/octopus [0x489d2b]
=========     Host Frame:/home/tancognn/octopus-code/bin/octopus [0x17b7e]
=========     Host Frame:/home/tancognn/octopus-code/bin/octopus [0x1621a]
=========     Host Frame:/home/tancognn/octopus-code/bin/octopus [0x15c8d]
=========     Host Frame:/lib/x86_64-linux-gnu/libc.so.6 (__libc_start_main + 0xea) [0x23d0a]
=========     Host Frame:/home/tancognn/octopus-code/bin/octopus [0x15eda]
=========
========= Invalid __global__ write of size 8
=========     at 0x000001e0 in get_selected_points
=========     by thread (6,0,0) in block (0,0,0)
=========     Address 0x7f19edffa030 is out of bounds
=========     Device Frame:get_selected_points (get_selected_points : 0x1e0)
=========     Saved host backtrace up to driver entry point at kernel launch time
=========     Host Frame:/lib/x86_64-linux-gnu/libcuda.so.1 [0x27037a]
=========     Host Frame:/home/tancognn/octopus-code/bin/octopus [0x40b575]
=========     Host Frame:/home/tancognn/octopus-code/bin/octopus [0x2b592]
=========     Host Frame:/home/tancognn/octopus-code/bin/octopus [0x874b88]
=========     Host Frame:/home/tancognn/octopus-code/bin/octopus [0x24c1e7]
=========     Host Frame:/home/tancognn/octopus-code/bin/octopus [0xb15a5]
=========     Host Frame:/home/tancognn/octopus-code/bin/octopus [0xa60fb]
=========     Host Frame:/home/tancognn/octopus-code/bin/octopus [0x489d2b]
=========     Host Frame:/home/tancognn/octopus-code/bin/octopus [0x17b7e]
=========     Host Frame:/home/tancognn/octopus-code/bin/octopus [0x1621a]
=========     Host Frame:/home/tancognn/octopus-code/bin/octopus [0x15c8d]
=========     Host Frame:/lib/x86_64-linux-gnu/libc.so.6 (__libc_start_main + 0xea) [0x23d0a]
=========     Host Frame:/home/tancognn/octopus-code/bin/octopus [0x15eda]
=========
========= Program hit CUDA_ERROR_LAUNCH_FAILED (error 719) due to "unspecified launch failure" on CUDA API call to cuMemcpyDtoHAsync_v2.
=========     Saved host backtrace up to driver entry point at error
=========     Host Frame:/lib/x86_64-linux-gnu/libcuda.so.1 [0x2747fa]
=========     Host Frame:/home/tancognn/octopus-code/bin/octopus [0x40ac51]
=========     Host Frame:/home/tancognn/octopus-code/bin/octopus [0x21f62]
=========     Host Frame:/home/tancognn/octopus-code/bin/octopus [0x2275d]
=========     Host Frame:/home/tancognn/octopus-code/bin/octopus [0x22ee8]
=========     Host Frame:/home/tancognn/octopus-code/bin/octopus [0x874c3c]
=========     Host Frame:/home/tancognn/octopus-code/bin/octopus [0x24c1e7]
=========     Host Frame:/home/tancognn/octopus-code/bin/octopus [0xb15a5]
=========     Host Frame:/home/tancognn/octopus-code/bin/octopus [0xa60fb]
=========     Host Frame:/home/tancognn/octopus-code/bin/octopus [0x489d2b]
=========     Host Frame:/home/tancognn/octopus-code/bin/octopus [0x17b7e]
=========     Host Frame:/home/tancognn/octopus-code/bin/octopus [0x1621a]
=========     Host Frame:/home/tancognn/octopus-code/bin/octopus [0x15c8d]
=========     Host Frame:/lib/x86_64-linux-gnu/libc.so.6 (__libc_start_main + 0xea) [0x23d0a]
=========     Host Frame:/home/tancognn/octopus-code/bin/octopus [0x15eda]
=========

error: cuMemcpyDtoHAsync(data, **cuda_ptr + *offset, *size, phStream[current_stream]) failed with error CUDA_ERROR_LAUNCH_FAILED

Solution: add

if (ist >= dyy) return;

to share/opencl/get_points.cl.

Edited by Franco Bonafé