Skip to content

SYCL: Avoid performance regression with ROCm 5.5 on MI250X (part 2)

Andrey Alekseenko requested to merge aa-4874-p2 into release-2023

AMD CDNA2 devices, such as MI250, achieve full advertised performance only when operating on packed FP32 floats, in a 2-wide SIMD way.

As noted in #4874, this explicit use of packing is necessary with ROCm 5.5+ to mitigate performance regression of NBNXM LJ Force Switch kernels, compared to ROCm 5.3 and earlier.

This is a commit 56e7168c (!3838 (merged)) cherry-picked from main to 2023, with release notes added.

Refs #4854 (closed)

Fixes #4874

Edited by Mark Abraham

Merge request reports