Improve AMD GPU performance on Linux/ROCm
Current AMD GPU support is limited to VEGA 56/64 running on Linux (w/ ROCM). While it runs faster than the CPU, it is significantly slower than an equivalent NVidia GPU
It should be easy to enable POLARIS support, and maybe NAVI, by recompiling again ROCM 4.0 (current client is linked against ROCM 3.8).
But it also seems like we're missing some optimization opportunities and not using AMD GPUs to their full potential.