performance regression with ROCm 5.5/5.6 on MI250
Summary
Up to 10% performance regression measured on MI250 with ROCm 5.5.1 and 5.6.1 compared to 5.4.3 (and 5.3.3). Benchmarks used:
- GROMACS 2023.2 and main
- ADH and EAG1 inputs
- hipSYCL 0.9.4 using ROCm clang.
Impact
Users of AMD hardware.
Detailed description
- The nonbonded force-only kernel has regressed by up to 20%; an increase of VGPR use (from 76->80) could could be the main reason.
- PME spread regressed by ~8%.
ADH inputs, logs, and rocprof output:adh_cubic_bench.tar.gz
Here is some more detailed data comparing a wider range of nbnxm kernel flavors (benchmark system is a 384k water box):
F | FV | ||||||
---|---|---|---|---|---|---|---|
rocm 5.4 | rocm 5.6 | rel perf. | rocm 5.4 | rocm 5.6 | rel perf. | ||
ew-ana | fsw | 1691885 | 2078273 | 1.23 | 2749135 | 2760895 | 1.00 |
ljpme-geom | 2032967 | 2422091 | 1.19 | 2925697 | 2929056 | 1.00 | |
psh | 1392691 | 1429430 | 1.03 | 2212733 | 2213132 | 1.00 | |
psw | 1825736 | 1949873 | 1.07 | 2778256 | 2761695 | 0.99 | |
ew-tab | |||||||
fsw | 1596204 | 1827641 | 1.14 | 2494254 | 2529213 | 1.01 | |
ljpme-geom | 1944850 | 2288760 | 1.18 | 2702735 | 2708815 | 1.00 | |
psh | 1338316 | 1598744 | 1.19 | 1989211 | 1999291 | 1.01 | |
psw | 2013485 | 2122755 | 1.05 | 2568415 | 2544413 | 0.99 | |
rf | 1059485 | 1037376 | 0.98 | 1307607 | 1278566 | 0.98 |
Edited by Szilárd Páll