inconsistent pinning behavior due to missing SMT info on AMD Zen - Redmine #2388
The mdrun native hardware detection does recognize hardware thread order on the AMD Zen uarch processors, but does not detect the SMT to correctly assign the hardware threads to cores. As a result (besides the reporting being incorrect), only at most half of the cores are used when a run is launched with #threads<#hwthreads/2. On Intel with HT in such ases the default stride is switched to 2 to spread threads across cores when the total thread count is <=#cores.
- Native detection
Hardware topology: Basic
Sockets, cores, and logical processors:
Socket 0: [ 0] [ 16] [ 1] [ 17] [ 2] [ 18] [ 3] [ 19] [ 4] [ 20] [ 5] [ 21] [ 6] [ 22] [ 7] [ 23] [ 8] [ 24] [ 9] [ 25] [ 10] [ 26] [ 11] [ 27] [ 12] [ 28] [ 13] [ 29] [ 14] [ 30] [ 15] [ 31]
- Detection with hwloc:
Hardware topology: Full, with devices
Sockets, cores, and logical processors:
Socket 0: [ 0 16] [ 1 17] [ 2 18] [ 3 19] [ 4 20] [ 5 21] [ 6 22] [ 7 23] [ 8 24] [ 9 25] [ 10 26] [ 11 27] [ 12 28] [ 13 29] [ 14 30] [ 15 31]
(from redmine: issue id 2388, created on 2018-01-19 by pszilard, closed on 2018-03-02)
- Changesets:
- Revision b9c04931 by Berk Hess on 2018-03-01T21:34:14Z:
Detect AMD SMT topology
On AMD Zen the cpuinfo code detected hyperthreading but put all
threads on different cores in the topology. Now the correct
topology is detected using extended APIC.
Also disabled topology detection for non-AMD, non-x2APIC x86.
Fixes #2388
Change-Id: I194f3e09e669c20d1d62355a36be062e6cce264e
- Uploads:
- _test-native_1x16_thrp01.log native detection
- _test-hwloc_1x16_thrp01.log hwloc detection
- _test-native_1x16_thrp01_gmx16.log native detection with r2016
- amd-cpuinfo-fix.tgz