When including vanilla GROMACS 2024.1 in EESSI we are seeing consistent test suite failures for Neoverse V1 with GCC 13.2.0 and OpenMPI 4.1.6 (see test_log.out.gz for full test suite output).
15/87 Test #15: GmxlibTests ..................................Subprocess aborted***Exception: 1.11 sec[==========] Running 78 tests from 2 test suites.[----------] Global test environment set-up.[----------] 72 tests from NBInteraction/NonbondedFepTest[ RUN ] NBInteraction/NonbondedFepTest.testKernel/0/tmp/bot/easybuild/build/GROMACS/2024.1/foss-2023b/gromacs-2024.1/src/testutils/refdata.cpp:957: Failure In item: /Forces/[1]/X Actual: -450.75637494131161 Reference: -94.898901947422004Difference: 355.857 (10259072709535721 double-prec. ULPs, rel. 3.75) Tolerance: abs. 1e-08, rel. 1e-11, 100 ULPs/tmp/bot/easybuild/build/GROMACS/2024.1/foss-2023b/gromacs-2024.1/src/testutils/refdata.cpp:957: Failure In item: /Forces/[2]/X Actual: 189.79780389484378 Reference: 94.898901947421891Difference: 94.8989 (4503599627370496 double-prec. ULPs, rel. 1) Tolerance: abs. 1e-08, rel. 1e-11, 100 ULPs/tmp/bot/easybuild/build/GROMACS/2024.1/foss-2023b/gromacs-2024.1/src/testutils/refdata.cpp:957: Failure In item: /Forces/[3]/X Actual: 521.91714209293536 Reference: 260.95857104646768Difference: 260.959 (4503599627370496 double-prec. ULPs, rel. 1) Tolerance: abs. 1e-08, rel. 1e-11, 100 ULPs[ FAILED ] NBInteraction/NonbondedFepTest.testKernel/0, where GetParam() = (4-byte object <00-00 00-00>, 936-byte object <25-00 00-00 BD-37 86-35 3A-8C 30-E2 8E-79 45-3E 03-00 00-00 15-BD 63-BF 00-00 00-00 00-00 00-00 00-00 00-00 00-00 00-00 00-00 00-00 00-00 00-00 10-98 6D-2D 00-00 00-00 D0-9E 6D-2D 00-00 00-00 ... 33-33 33-33 33-33 EB-3F 33-33 33-33 33-33 D3-3F 33-33 33-33 33-33 D3-3F F3-21 A8-1A BD-DA 02-00 00-00 00-00 00-00 00-00 00-00 00-00 F1-30 62-BF 00-00 00-00 00-00 00-00 00-00 00-00 00-00 F0-3F>, { 0x2d6b4900, 0x2d6b4918, 0x2d6b4930, 0x2d6b4948 }, 0, 0, true) (0 ms)[ RUN ] NBInteraction/NonbondedFepTest.testKernel/1/tmp/bot/easybuild/build/GROMACS/2024.1/foss-2023b/gromacs-2024.1/src/testutils/refdata.cpp:957: Failure In item: /Forces/[1]/X Actual: -450.75637494131161 Reference: -94.898901947422004Difference: 355.857 (10259072709535721 double-prec. ULPs, rel. 3.75) Tolerance: abs. 1e-08, rel. 1e-11, 100 ULPs/tmp/bot/easybuild/build/GROMACS/2024.1/foss-2023b/gromacs-2024.1/src/testutils/refdata.cpp:957: Failure In item: /Forces/[2]/X Actual: 189.79780389484378 Reference: 94.898901947421891Difference: 94.8989 (4503599627370496 double-prec. ULPs, rel. 1) Tolerance: abs. 1e-08, rel. 1e-11, 100 ULPs/tmp/bot/easybuild/build/GROMACS/2024.1/foss-2023b/gromacs-2024.1/src/testutils/refdata.cpp:957: Failure In item: /Forces/[3]/X Actual: 521.91714209293536 Reference: 260.95857104646768Difference: 260.959 (4503599627370496 double-prec. ULPs, rel. 1) Tolerance: abs. 1e-08, rel. 1e-11, 100 ULPsfree(): invalid next size (fast) Start 16: MdlibUnitTest...[ RUN ] SimdFloatingpointUtilTest.transposeScatterIncrU3/tmp/bot/easybuild/build/GROMACS/2024.1/foss-2023b/gromacs-2024.1/src/gromacs/simd/tests/simd_floatingpoint_util.cpp:353: Failure Value of: mem0_[j] Actual: 1004.0000000000223 Expected: refmem[j] Which is: 1003.0000000000223Difference: 1 (8796093022208 double-prec. ULPs, rel. 0.000997) Tolerance: abs. 8.88178e-16, 4 ULPs/tmp/bot/easybuild/build/GROMACS/2024.1/foss-2023b/gromacs-2024.1/src/gromacs/simd/tests/simd_floatingpoint_util.cpp:353: Failure Value of: mem0_[j] Actual: 1014.0000000000225 Expected: refmem[j] Which is: 1012.0000000000225Difference: 2 (17592186044416 double-prec. ULPs, rel. 0.00198) Tolerance: abs. 8.88178e-16, 4 ULPs/tmp/bot/easybuild/build/GROMACS/2024.1/foss-2023b/gromacs-2024.1/src/gromacs/simd/tests/simd_floatingpoint_util.cpp:353: Failure Value of: mem0_[j] Actual: 1024.0000000000227 Expected: refmem[j] Which is: 1021.0000000000226Difference: 3 (26388279066525 double-prec. ULPs, rel. 0.00294) Tolerance: abs. 8.88178e-16, 4 ULPs/tmp/bot/easybuild/build/GROMACS/2024.1/foss-2023b/gromacs-2024.1/src/gromacs/simd/tests/simd_floatingpoint_util.cpp:353: Failure Value of: mem0_[j] Actual: 2030.000000000045 Expected: refmem[j] Which is: 1030.000000000023Difference: 1000 (4398046511104097 double-prec. ULPs, rel. 0.971) Tolerance: abs. 8.88178e-16, 4 ULPs/tmp/bot/easybuild/build/GROMACS/2024.1/foss-2023b/gromacs-2024.1/src/gromacs/simd/tests/simd_floatingpoint_util.cpp:353: Failure Value of: mem0_[j] Actual: 1004.0000000000223 Expected: refmem[j] Which is: 1003.0000000000223Difference: 1 (8796093022208 double-prec. ULPs, rel. 0.000997) Tolerance: abs. 8.88178e-16, 4 ULPs/tmp/bot/easybuild/build/GROMACS/2024.1/foss-2023b/gromacs-2024.1/src/gromacs/simd/tests/simd_floatingpoint_util.cpp:353: Failure Value of: mem0_[j] Actual: 1017.0000000000225 Expected: refmem[j] Which is: 1015.0000000000225Difference: 2 (17592186044416 double-prec. ULPs, rel. 0.00197) Tolerance: abs. 8.88178e-16, 4 ULPs/tmp/bot/easybuild/build/GROMACS/2024.1/foss-2023b/gromacs-2024.1/src/gromacs/simd/tests/simd_floatingpoint_util.cpp:353: Failure Value of: mem0_[j] Actual: 1030.0000000000227 Expected: refmem[j] Which is: 1027.0000000000227Difference: 3 (13194139533312 double-prec. ULPs, rel. 0.00292) Tolerance: abs. 8.88178e-16, 4 ULPs/tmp/bot/easybuild/build/GROMACS/2024.1/foss-2023b/gromacs-2024.1/src/gromacs/simd/tests/simd_floatingpoint_util.cpp:353: Failure Value of: mem0_[j] Actual: 2039.000000000045 Expected: refmem[j] Which is: 1039.000000000023Difference: 1000 (4398046511104097 double-prec. ULPs, rel. 0.962) Tolerance: abs. 8.88178e-16, 4 ULPs[ FAILED ] SimdFloatingpointUtilTest.transposeScatterIncrU3 (0 ms)[ RUN ] SimdFloatingpointUtilTest.transposeScatterIncrU3Overlapping/tmp/bot/easybuild/build/GROMACS/2024.1/foss-2023b/gromacs-2024.1/src/gromacs/simd/tests/simd_floatingpoint_util.cpp:393: Failure Value of: mem0_[j] Actual: 2015.0000000000446 Expected: refmem[j] Which is: 1003.0000000000223Difference: 1012 (4543182045970432 double-prec. ULPs, rel. 1.01) Tolerance: abs. 8.88178e-16, 4 ULPs[ FAILED ] SimdFloatingpointUtilTest.transposeScatterIncrU3Overlapping (0 ms)[ RUN ] SimdFloatingpointUtilTest.transposeScatterDecrU3/tmp/bot/easybuild/build/GROMACS/2024.1/foss-2023b/gromacs-2024.1/src/gromacs/simd/tests/simd_floatingpoint_util.cpp:444: Failure Value of: mem0_[j] Actual: 1002.0000000000223 Expected: refmem[j] Which is: 1003.0000000000223Difference: 1 (8796093022208 double-prec. ULPs, rel. 0.000997) Tolerance: abs. 8.88178e-16, 4 ULPs/tmp/bot/easybuild/build/GROMACS/2024.1/foss-2023b/gromacs-2024.1/src/gromacs/simd/tests/simd_floatingpoint_util.cpp:444: Failure Value of: mem0_[j] Actual: 1010.0000000000225 Expected: refmem[j] Which is: 1012.0000000000225Difference: 2 (17592186044416 double-prec. ULPs, rel. 0.00198) Tolerance: abs. 8.88178e-16, 4 ULPs/tmp/bot/easybuild/build/GROMACS/2024.1/foss-2023b/gromacs-2024.1/src/gromacs/simd/tests/simd_floatingpoint_util.cpp:444: Failure Value of: mem0_[j] Actual: 1018.0000000000225 Expected: refmem[j] Which is: 1021.0000000000226Difference: 3 (26388279066625 double-prec. ULPs, rel. 0.00294) Tolerance: abs. 8.88178e-16, 4 ULPs/tmp/bot/easybuild/build/GROMACS/2024.1/foss-2023b/gromacs-2024.1/src/gromacs/simd/tests/simd_floatingpoint_util.cpp:444: Failure Value of: mem0_[j] Actual: 30.000000000000796 Expected: refmem[j] Which is: 1030.000000000023Difference: 1000 (23107336369340293 double-prec. ULPs, rel. 0.971) Tolerance: abs. 8.88178e-16, 4 ULPs/tmp/bot/easybuild/build/GROMACS/2024.1/foss-2023b/gromacs-2024.1/src/gromacs/simd/tests/simd_floatingpoint_util.cpp:444: Failure Value of: mem0_[j] Actual: 1002.0000000000223 Expected: refmem[j] Which is: 1003.0000000000223Difference: 1 (8796093022208 double-prec. ULPs, rel. 0.000997) Tolerance: abs. 8.88178e-16, 4 ULPs/tmp/bot/easybuild/build/GROMACS/2024.1/foss-2023b/gromacs-2024.1/src/gromacs/simd/tests/simd_floatingpoint_util.cpp:444: Failure Value of: mem0_[j] Actual: 1013.0000000000225 Expected: refmem[j] Which is: 1015.0000000000225Difference: 2 (17592186044416 double-prec. ULPs, rel. 0.00197) Tolerance: abs. 8.88178e-16, 4 ULPs/tmp/bot/easybuild/build/GROMACS/2024.1/foss-2023b/gromacs-2024.1/src/gromacs/simd/tests/simd_floatingpoint_util.cpp:444: Failure Value of: mem0_[j] Actual: 1024.0000000000227 Expected: refmem[j] Which is: 1027.0000000000227Difference: 3 (13194139533312 double-prec. ULPs, rel. 0.00292) Tolerance: abs. 8.88178e-16, 4 ULPs/tmp/bot/easybuild/build/GROMACS/2024.1/foss-2023b/gromacs-2024.1/src/gromacs/simd/tests/simd_floatingpoint_util.cpp:444: Failure Value of: mem0_[j] Actual: 39.000000000000796 Expected: refmem[j] Which is: 1039.000000000023Difference: 1000 (21598806416031733 double-prec. ULPs, rel. 0.962) Tolerance: abs. 8.88178e-16, 4 ULPs[ FAILED ] SimdFloatingpointUtilTest.transposeScatterDecrU3 (0 ms)[ RUN ] SimdFloatingpointUtilTest.transposeScatterDecrU3Overlapping/tmp/bot/easybuild/build/GROMACS/2024.1/foss-2023b/gromacs-2024.1/src/gromacs/simd/tests/simd_floatingpoint_util.cpp:484: Failure Value of: mem0_[j] Actual: 3.0000000000001137 Expected: refmem[j] Which is: 1003.0000000000223Difference: 1000 (38095878879182788 double-prec. ULPs, rel. 0.997) Tolerance: abs. 8.88178e-16, 4 ULPs[ FAILED ] SimdFloatingpointUtilTest.transposeScatterDecrU3Overlapping (0 ms)...< many errors leading to >[ FAILED ] EquivalentToReference/FreeEnergyReferenceTest.WithinTolerances/coulandvdwsequential_coul_d, where GetParam() = ("coulandvdwsequential_coul", 1, { 89, 90 }) (111 ms)< and many similar failures >...
So probably there's some invalid SIMD code exposed by those unit tests, and some higher-level tests failing because of the invalid SIMD code used in the NBNXM FEP kernel.
@ocaisa, the cmake command you pasted into the issue does not match the logs. It has -DGMX_DOUBLE=OFF, but the error message and the compiler commands in test_log.out.gz indicate double-precision.
Any updates on this?
This is currently blocking us from included GROMACS in EESSI, because the failing tests seem "alarming" enough that we can't claim we have a correct/fully working build of GROMACS on Neoverse V1 (and we strongly prefer only including software installations that have no known issues on any of the supported CPU targets).
Is there something we can change in our build for Neoverse V1 to circumvent the issue raised by the tests (even if that means building with lower optimizations)?
That's good to know, this may be our short-term escape hatch, since it seems like this SVE bug isn't going to be trivial to fix...
We're happy to install GROMACS 2024.1 with -DGMX_SIMD=ARM_NEON on Neoverse V1, and then install GROMACS 2024.2 without that workaround (if the problem is fixed in GROMACS 2024.2).
Thanks a lot for the info, this is exactly the sort of feedback we're hoping to get from the software developers, since this allows us to install GROMACS in EESSI according to best practices, and with reasonable workarounds for confirmed bugs where needed.
Just for good measure: correct option is -DGMX_SIMD=ARM_NEON_ASIMD.
I'm testing that now to verify that this will indeed make the test suite pass for all 4 configurations that we build GROMACS with (-DGMX_MPI=OFF -DGMX_THREAD_MPI=ON vs -DGMX_MPI=ON -DGMX_THREAD_MPI=OFF, and -DGMX_DOUBLE=OFF vs -DGMX_DOUBLE=ON)
Update: GROMACS test suite passes just fine on Neoverse V1 when using -DGMX_SIMD=ARM_NEON_ASIMD, as expected, so we've installed GROMACS 2024.1 with that workaround in place in EESSI (see https://github.com/EESSI/software-layer/pull/499)
This should be easy to debug. The test results show that the issue is in two SIMD functions. The failure pattern is:
/tmp/bot/easybuild/build/GROMACS/2024.1/foss-2023b/gromacs-2024.1/src/gromacs/simd/tests/simd_floatingpoint_util.cpp:353: Failure
Value of: mem0_[j]
Actual: 1004.0000000000223
Expected: refmem[j]
Which is: 1003.0000000000223
This means that there is either an increment too many or an indexing error. Can you tell us for what value of the loop variable i or align values (line 317 in the file indicated above) this test fails?
Unfortunately I suspect it might be more complex than that. Arm (and SVE) have pretty complex load/store operations, and the code to handle this for multiple different-width SVE implementations is nontrivial.
@boegel The branch https://gitlab.com/gromacs/gromacs/-/commits/simdTest_moreVerbosity is based on release-2024 from which we made 2024.1 and will make 2024.2, so should build cleanly in your infrastructure. The final commit adds some more context to any failures, so if you can run that and share the output, that will help us narrow down the problem.
Here's the enhanced output for the failing test, hopefully this is helpful:
[ RUN ] SimdFloatingpointUtilTest.transposeScatterIncrU3/tmp/boegel/easybuild/build/GROMACS/2024.1/foss-2023b/gromacs-2024.1/src/gromacs/simd/tests/simd_floatingpoint_util.cpp:362: Failure Value of: mem0_[j] Actual: 1004.0000000000223 Expected: refmem[j] Which is: 1003.0000000000223Difference: 1 (8796093022208 double-prec. ULPs, rel. 0.000997) Tolerance: abs. 8.88178e-16, 4 ULPs for element 3Google Test trace:/tmp/boegel/easybuild/build/GROMACS/2024.1/foss-2023b/gromacs-2024.1/src/gromacs/simd/tests/simd_floatingpoint_util.cpp:326: Alignment value 3/tmp/boegel/easybuild/build/GROMACS/2024.1/foss-2023b/gromacs-2024.1/src/gromacs/simd/tests/simd_floatingpoint_util.cpp:362: Failure Value of: mem0_[j] Actual: 1014.0000000000225 Expected: refmem[j] Which is: 1012.0000000000225Difference: 2 (17592186044416 double-prec. ULPs, rel. 0.00198) Tolerance: abs. 8.88178e-16, 4 ULPs for element 12Google Test trace:/tmp/boegel/easybuild/build/GROMACS/2024.1/foss-2023b/gromacs-2024.1/src/gromacs/simd/tests/simd_floatingpoint_util.cpp:326: Alignment value 3/tmp/boegel/easybuild/build/GROMACS/2024.1/foss-2023b/gromacs-2024.1/src/gromacs/simd/tests/simd_floatingpoint_util.cpp:362: Failure Value of: mem0_[j] Actual: 1024.0000000000227 Expected: refmem[j] Which is: 1021.0000000000226Difference: 3 (26388279066525 double-prec. ULPs, rel. 0.00294) Tolerance: abs. 8.88178e-16, 4 ULPs for element 21Google Test trace:/tmp/boegel/easybuild/build/GROMACS/2024.1/foss-2023b/gromacs-2024.1/src/gromacs/simd/tests/simd_floatingpoint_util.cpp:326: Alignment value 3/tmp/boegel/easybuild/build/GROMACS/2024.1/foss-2023b/gromacs-2024.1/src/gromacs/simd/tests/simd_floatingpoint_util.cpp:362: Failure Value of: mem0_[j] Actual: 2030.000000000045 Expected: refmem[j] Which is: 1030.000000000023Difference: 1000 (4398046511104097 double-prec. ULPs, rel. 0.971) Tolerance: abs. 8.88178e-16, 4 ULPs for element 30Google Test trace:/tmp/boegel/easybuild/build/GROMACS/2024.1/foss-2023b/gromacs-2024.1/src/gromacs/simd/tests/simd_floatingpoint_util.cpp:326: Alignment value 3/tmp/boegel/easybuild/build/GROMACS/2024.1/foss-2023b/gromacs-2024.1/src/gromacs/simd/tests/simd_floatingpoint_util.cpp:362: Failure Value of: mem0_[j] Actual: 1004.0000000000223 Expected: refmem[j] Which is: 1003.0000000000223Difference: 1 (8796093022208 double-prec. ULPs, rel. 0.000997) Tolerance: abs. 8.88178e-16, 4 ULPs for element 3Google Test trace:/tmp/boegel/easybuild/build/GROMACS/2024.1/foss-2023b/gromacs-2024.1/src/gromacs/simd/tests/simd_floatingpoint_util.cpp:326: Alignment value 4/tmp/boegel/easybuild/build/GROMACS/2024.1/foss-2023b/gromacs-2024.1/src/gromacs/simd/tests/simd_floatingpoint_util.cpp:362: Failure Value of: mem0_[j] Actual: 1017.0000000000225 Expected: refmem[j] Which is: 1015.0000000000225Difference: 2 (17592186044416 double-prec. ULPs, rel. 0.00197) Tolerance: abs. 8.88178e-16, 4 ULPs for element 15Google Test trace:/tmp/boegel/easybuild/build/GROMACS/2024.1/foss-2023b/gromacs-2024.1/src/gromacs/simd/tests/simd_floatingpoint_util.cpp:326: Alignment value 4/tmp/boegel/easybuild/build/GROMACS/2024.1/foss-2023b/gromacs-2024.1/src/gromacs/simd/tests/simd_floatingpoint_util.cpp:362: Failure Value of: mem0_[j] Actual: 1030.0000000000227 Expected: refmem[j] Which is: 1027.0000000000227Difference: 3 (13194139533312 double-prec. ULPs, rel. 0.00292) Tolerance: abs. 8.88178e-16, 4 ULPs for element 27Google Test trace:/tmp/boegel/easybuild/build/GROMACS/2024.1/foss-2023b/gromacs-2024.1/src/gromacs/simd/tests/simd_floatingpoint_util.cpp:326: Alignment value 4/tmp/boegel/easybuild/build/GROMACS/2024.1/foss-2023b/gromacs-2024.1/src/gromacs/simd/tests/simd_floatingpoint_util.cpp:362: Failure Value of: mem0_[j] Actual: 2039.000000000045 Expected: refmem[j] Which is: 1039.000000000023Difference: 1000 (4398046511104097 double-prec. ULPs, rel. 0.962) Tolerance: abs. 8.88178e-16, 4 ULPs for element 39Google Test trace:/tmp/boegel/easybuild/build/GROMACS/2024.1/foss-2023b/gromacs-2024.1/src/gromacs/simd/tests/simd_floatingpoint_util.cpp:326: Alignment value 4[ FAILED ] SimdFloatingpointUtilTest.transposeScatterIncrU3 (0 ms)[ RUN ] SimdFloatingpointUtilTest.transposeScatterIncrU3Overlapping/tmp/boegel/easybuild/build/GROMACS/2024.1/foss-2023b/gromacs-2024.1/src/gromacs/simd/tests/simd_floatingpoint_util.cpp:402: Failure Value of: mem0_[j] Actual: 2015.0000000000446 Expected: refmem[j] Which is: 1003.0000000000223Difference: 1012 (4543182045970432 double-prec. ULPs, rel. 1.01) Tolerance: abs. 8.88178e-16, 4 ULPs for element 3[ FAILED ] SimdFloatingpointUtilTest.transposeScatterIncrU3Overlapping (0 ms)[ RUN ] SimdFloatingpointUtilTest.transposeScatterDecrU3/tmp/boegel/easybuild/build/GROMACS/2024.1/foss-2023b/gromacs-2024.1/src/gromacs/simd/tests/simd_floatingpoint_util.cpp:454: Failure Value of: mem0_[j] Actual: 1002.0000000000223 Expected: refmem[j] Which is: 1003.0000000000223Difference: 1 (8796093022208 double-prec. ULPs, rel. 0.000997) Tolerance: abs. 8.88178e-16, 4 ULPs for element 3Google Test trace:/tmp/boegel/easybuild/build/GROMACS/2024.1/foss-2023b/gromacs-2024.1/src/gromacs/simd/tests/simd_floatingpoint_util.cpp:418: Alignment value 3/tmp/boegel/easybuild/build/GROMACS/2024.1/foss-2023b/gromacs-2024.1/src/gromacs/simd/tests/simd_floatingpoint_util.cpp:454: Failure Value of: mem0_[j] Actual: 1010.0000000000225 Expected: refmem[j] Which is: 1012.0000000000225Difference: 2 (17592186044416 double-prec. ULPs, rel. 0.00198) Tolerance: abs. 8.88178e-16, 4 ULPs for element 12Google Test trace:/tmp/boegel/easybuild/build/GROMACS/2024.1/foss-2023b/gromacs-2024.1/src/gromacs/simd/tests/simd_floatingpoint_util.cpp:418: Alignment value 3/tmp/boegel/easybuild/build/GROMACS/2024.1/foss-2023b/gromacs-2024.1/src/gromacs/simd/tests/simd_floatingpoint_util.cpp:454: Failure Value of: mem0_[j] Actual: 1018.0000000000225 Expected: refmem[j] Which is: 1021.0000000000226Difference: 3 (26388279066625 double-prec. ULPs, rel. 0.00294) Tolerance: abs. 8.88178e-16, 4 ULPs for element 21Google Test trace:/tmp/boegel/easybuild/build/GROMACS/2024.1/foss-2023b/gromacs-2024.1/src/gromacs/simd/tests/simd_floatingpoint_util.cpp:418: Alignment value 3/tmp/boegel/easybuild/build/GROMACS/2024.1/foss-2023b/gromacs-2024.1/src/gromacs/simd/tests/simd_floatingpoint_util.cpp:454: Failure Value of: mem0_[j] Actual: 30.000000000000796 Expected: refmem[j] Which is: 1030.000000000023Difference: 1000 (23107336369340293 double-prec. ULPs, rel. 0.971) Tolerance: abs. 8.88178e-16, 4 ULPs for element 30Google Test trace:/tmp/boegel/easybuild/build/GROMACS/2024.1/foss-2023b/gromacs-2024.1/src/gromacs/simd/tests/simd_floatingpoint_util.cpp:418: Alignment value 3/tmp/boegel/easybuild/build/GROMACS/2024.1/foss-2023b/gromacs-2024.1/src/gromacs/simd/tests/simd_floatingpoint_util.cpp:454: Failure Value of: mem0_[j] Actual: 1002.0000000000223 Expected: refmem[j] Which is: 1003.0000000000223Difference: 1 (8796093022208 double-prec. ULPs, rel. 0.000997) Tolerance: abs. 8.88178e-16, 4 ULPs for element 3Google Test trace:/tmp/boegel/easybuild/build/GROMACS/2024.1/foss-2023b/gromacs-2024.1/src/gromacs/simd/tests/simd_floatingpoint_util.cpp:418: Alignment value 4/tmp/boegel/easybuild/build/GROMACS/2024.1/foss-2023b/gromacs-2024.1/src/gromacs/simd/tests/simd_floatingpoint_util.cpp:454: Failure Value of: mem0_[j] Actual: 1013.0000000000225 Expected: refmem[j] Which is: 1015.0000000000225Difference: 2 (17592186044416 double-prec. ULPs, rel. 0.00197) Tolerance: abs. 8.88178e-16, 4 ULPs for element 15Google Test trace:/tmp/boegel/easybuild/build/GROMACS/2024.1/foss-2023b/gromacs-2024.1/src/gromacs/simd/tests/simd_floatingpoint_util.cpp:418: Alignment value 4/tmp/boegel/easybuild/build/GROMACS/2024.1/foss-2023b/gromacs-2024.1/src/gromacs/simd/tests/simd_floatingpoint_util.cpp:454: Failure Value of: mem0_[j] Actual: 1024.0000000000227 Expected: refmem[j] Which is: 1027.0000000000227Difference: 3 (13194139533312 double-prec. ULPs, rel. 0.00292) Tolerance: abs. 8.88178e-16, 4 ULPs for element 27Google Test trace:/tmp/boegel/easybuild/build/GROMACS/2024.1/foss-2023b/gromacs-2024.1/src/gromacs/simd/tests/simd_floatingpoint_util.cpp:418: Alignment value 4/tmp/boegel/easybuild/build/GROMACS/2024.1/foss-2023b/gromacs-2024.1/src/gromacs/simd/tests/simd_floatingpoint_util.cpp:454: Failure Value of: mem0_[j] Actual: 39.000000000000796 Expected: refmem[j] Which is: 1039.000000000023Difference: 1000 (21598806416031733 double-prec. ULPs, rel. 0.962) Tolerance: abs. 8.88178e-16, 4 ULPs for element 39Google Test trace:/tmp/boegel/easybuild/build/GROMACS/2024.1/foss-2023b/gromacs-2024.1/src/gromacs/simd/tests/simd_floatingpoint_util.cpp:418: Alignment value 4[ FAILED ] SimdFloatingpointUtilTest.transposeScatterDecrU3 (0 ms)[ RUN ] SimdFloatingpointUtilTest.transposeScatterDecrU3Overlapping/tmp/boegel/easybuild/build/GROMACS/2024.1/foss-2023b/gromacs-2024.1/src/gromacs/simd/tests/simd_floatingpoint_util.cpp:494: Failure Value of: mem0_[j] Actual: 3.0000000000001137 Expected: refmem[j] Which is: 1003.0000000000223Difference: 1000 (38095878879182788 double-prec. ULPs, rel. 0.997) Tolerance: abs. 8.88178e-16, 4 ULPs for element 3[ FAILED ] SimdFloatingpointUtilTest.transposeScatterDecrU3Overlapping (0 ms)
These numbers aren't random, btw - but I created the test to make it reflect the order of data in memory.
What this tells me is that things end up in completely wrong order after this transpose-scatter-and-update function, so it's not just a matter of a single index, but we'll have to go through a couple of functions line-by-line and test on Graviton.
Unfortunately this is the one SIMD architecture I didn't write myself (it was rather contributed by RIKEN), but it does appear to be an error in the SIMD implementation - possibly because the RIKEN one was developed for Fugaku and mostly tested on 512-bit SVE, I presume.
I'll see if I can allocate a Graviton node myself next week and look into it.
@erik.lindahl If you need help with getting access to a Graviton 3 node, do let us know, AWS provides us with sponsored credits for EESSI, so we can easily set up a node for you to play around on.
I presume this means that this problem is also relevant for Neoverse V2 (like NVIDIA Grace, with 128-bit SVE)?
The SVE_SIMD4_DOUBLE_MASK looks suspicious to me there. Looking at the alternative copepath, SIMD3 seems more appropriate. But that would have also broken 512-bit SVE.