inconsistent pinning behavior due to missing SMT info on AMD Zen - Redmine #2388
The mdrun native hardware detection does recognize hardware thread order
on the AMD Zen uarch processors, but does not detect the SMT to
correctly assign the hardware threads to cores. As a result (besides the
reporting being incorrect), only at most half of the cores are used when
a run is launched with \#threads<\#hwthreads/2. On Intel with HT in
such ases the default stride is switched to 2 to spread threads across
cores when the total thread count is <=\#cores.
- Native detection
Hardware topology: Basic
Sockets, cores, and logical processors:
Socket 0: [ 0] [ 16] [ 1] [ 17] [ 2] [ 18] [ 3] [ 19] [ 4] [ 20] [ 5] [ 21] [ 6] [ 22] [ 7] [ 23] [ 8] [ 24] [ 9] [ 25] [ 10] [ 26] [ 11] [ 27] [ 12] [ 28] [ 13] [ 29] [ 14] [ 30] [ 15] [ 31]
<!-- -->
- Detection with hwloc:
Hardware topology: Full, with devices
Sockets, cores, and logical processors:
Socket 0: [ 0 16] [ 1 17] [ 2 18] [ 3 19] [ 4 20] [ 5 21] [ 6 22] [ 7 23] [ 8 24] [ 9 25] [ 10 26] [ 11 27] [ 12 28] [ 13 29] [ 14 30] [ 15 31]
*(from redmine: issue id 2388, created on 2018-01-19 by pszilard, closed on 2018-03-02)*
* Changesets:
* Revision b9c04931de7709424b2931720b5bb6952c005780 by Berk Hess on 2018-03-01T21:34:14Z:
```
Detect AMD SMT topology
On AMD Zen the cpuinfo code detected hyperthreading but put all
threads on different cores in the topology. Now the correct
topology is detected using extended APIC.
Also disabled topology detection for non-AMD, non-x2APIC x86.
Fixes #2388
Change-Id: I194f3e09e669c20d1d62355a36be062e6cce264e
```
* Uploads:
* [_test-native_1x16_thrp01.log](/uploads/bac2009b64dedec21c7d3d55cea79bc8/_test-native_1x16_thrp01.log) native detection
* [_test-hwloc_1x16_thrp01.log](/uploads/5c67289cb6ca4777801652cf0fc6d687/_test-hwloc_1x16_thrp01.log) hwloc detection
* [_test-native_1x16_thrp01_gmx16.log](/uploads/cda0e1d1420ee1ff2a47b55ac807f83e/_test-native_1x16_thrp01_gmx16.log) native detection with r2016
* [amd-cpuinfo-fix.tgz](/uploads/3feaf6b9d5c899105fc4d1a41bac21e5/amd-cpuinfo-fix.tgz)
issue