SYCL: Avoid performance regression with ROCm 5.5 on MI250X
Buffer splitting introduced in fb3e0b96 (!3104 (merged)) causes a significant performance slowdown with ROCm 5.5+ on MI250X (#4874).
Here we use templates to un-split the buffer for AMD devices, while keeping the old, split, code for others.
This is a commit c828d428 (!3736 (merged)) cherry-picked from main to 2023, with release notes added.
Refs #4593 (closed), #4854 (closed)