inconsistent pinning behavior due to missing SMT info on AMD Zen - Redmine #2388
The mdrun native hardware detection does recognize hardware thread order on the AMD Zen uarch processors, but does not detect the SMT to correctly assign the hardware threads to cores. As a result (besides the reporting being incorrect), only at most half of the cores are used when a run is launched with \#threads&lt;\#hwthreads/2. On Intel with HT in such ases the default stride is switched to 2 to spread threads across cores when the total thread count is &lt;=\#cores. - Native detection Hardware topology: Basic Sockets, cores, and logical processors: Socket 0: [ 0] [ 16] [ 1] [ 17] [ 2] [ 18] [ 3] [ 19] [ 4] [ 20] [ 5] [ 21] [ 6] [ 22] [ 7] [ 23] [ 8] [ 24] [ 9] [ 25] [ 10] [ 26] [ 11] [ 27] [ 12] [ 28] [ 13] [ 29] [ 14] [ 30] [ 15] [ 31] <!-- --> - Detection with hwloc: Hardware topology: Full, with devices Sockets, cores, and logical processors: Socket 0: [ 0 16] [ 1 17] [ 2 18] [ 3 19] [ 4 20] [ 5 21] [ 6 22] [ 7 23] [ 8 24] [ 9 25] [ 10 26] [ 11 27] [ 12 28] [ 13 29] [ 14 30] [ 15 31] *(from redmine: issue id 2388, created on 2018-01-19 by pszilard, closed on 2018-03-02)* * Changesets: * Revision b9c04931de7709424b2931720b5bb6952c005780 by Berk Hess on 2018-03-01T21:34:14Z: ``` Detect AMD SMT topology On AMD Zen the cpuinfo code detected hyperthreading but put all threads on different cores in the topology. Now the correct topology is detected using extended APIC. Also disabled topology detection for non-AMD, non-x2APIC x86. Fixes #2388 Change-Id: I194f3e09e669c20d1d62355a36be062e6cce264e ``` * Uploads: * [_test-native_1x16_thrp01.log](/uploads/bac2009b64dedec21c7d3d55cea79bc8/_test-native_1x16_thrp01.log) native detection * [_test-hwloc_1x16_thrp01.log](/uploads/5c67289cb6ca4777801652cf0fc6d687/_test-hwloc_1x16_thrp01.log) hwloc detection * [_test-native_1x16_thrp01_gmx16.log](/uploads/cda0e1d1420ee1ff2a47b55ac807f83e/_test-native_1x16_thrp01_gmx16.log) native detection with r2016 * [amd-cpuinfo-fix.tgz](/uploads/3feaf6b9d5c899105fc4d1a41bac21e5/amd-cpuinfo-fix.tgz)
issue