QMCPACK + LD_AUDIT errors on Aurora
Measurement of QMCPACK using HPCToolkit develop on Aurora fails with the message
realloc(): invalid pointer
Update: after tracking this down, we found that this bug has been known to Red Hat for months (https://bugzilla.redhat.com/show_bug.cgi?id=2330213) and fixed in glibc 2.41.
Background:
- ELF handling for TLS (Ulrich Drepper, 2005) https://uclibc.org/docs/tls.pdf
- A Deep dive into (implicit) Thread Local Storage: https://chao-tic.github.io/blog/2018/12/25/tls# Investigation has shown that this is the result of bug in glibc 2.31 on SUSE SP4 (15.4) on Aurora.
TLS is implemented by glibc using a Dynamic Thread Vector (DTV), which can be thought of as a two dimensional array that can address any TLS variable by a module ID and a TLS variable offset.
A problem with QMCPACK manifests when glibc reallocates the DTV in _dl_resize_dtv at the callstack below.
#0 0x00007fafa2efcd2b in raise () from /lib64/libc.so.6
#1 0x00007fafa2efe3e5 in abort () from /lib64/libc.so.6
#2 0x00007fafa2f42c27 in __libc_message () from /lib64/libc.so.6
#3 0x00007fafa2f4acca in malloc_printerr () from /lib64/libc.so.6
#4 0x00007fafa2f500fa in realloc () from /lib64/libc.so.6
#5 0x00007fafc9442db4 in _dl_resize_dtv () from /lib64/ld-linux-x86-64.so.2
#6 0x00007fafc944375e in _dl_update_slotinfo () from /lib64/ld-linux-x86-64.so.2
#7 0x00007fafc94438cc in update_get_addr () from /lib64/ld-linux-x86-64.so.2
#8 0x00007fafc9449228 in __tls_get_addr () from /lib64/ld-linux-x86-64.so.2
#9 0x00007faf8d66b412 in ?? () from /usr/lib64/libudev.so.1
#10 0x00007faf8d668659 in ?? () from /usr/lib64/libudev.so.1
#11 0x00007faf8d662be4 in ?? () from /usr/lib64/libudev.so.1
#12 0x00007faf8d662cf2 in ?? () from /usr/lib64/libudev.so.1
#13 0x00007faf8d664808 in ?? () from /usr/lib64/libudev.so.1
#14 0x00007faf8d65048e in ?? () from /usr/lib64/libudev.so.1
#15 0x00007faf8d64e4fe in udev_enumerate_add_match_sysattr () from /usr/lib64/libudev.so.1
#16 0x00007faf8d6a82f1 in igsc_device_iterator_create () from /usr/lib64/libigsc.so.0
#17 0x00007faf9c38f435 in ?? () from /usr/lib64/libze_intel_gpu.so.1
#18 0x00007faf9c3ac05f in ?? () from /usr/lib64/libze_intel_gpu.so.1
#19 0x00007faf9c3bab08 in ?? () from /usr/lib64/libze_intel_gpu.so.1
#20 0x00007faf8db911ac in ur::level_zero::urDeviceGetInfo(ur_device_handle_t_*, ur_device_info_t, unsigned long, void*, unsigned long*) ()
from /opt/intel/oneapi/2025.2/lib/libur_adapter_level_zero.so.0
#21 0x00007fafa2461fbd in ur_loader::urDeviceGetInfo(ur_device_handle_t_*, ur_device_info_t, unsigned long, void*, unsigned long*) ()
from /opt/intel/oneapi/2025.2/lib/libur_loader.so.0
#22 0x00007fafa247708d in urDeviceGetInfo () from /opt/intel/oneapi/2025.2/lib/libur_loader.so.0
#23 0x00007fafaab73731 in sycl::_V1::detail::device_impl::has(sycl::_V1::aspect) const () from /opt/intel/oneapi/2025.2/lib/libsycl.so.8
#24 0x0000000000dc4ea8 in qmcplusplus::SYCLDeviceManager::SYCLDeviceManager (this=0x740c350, default_device_num=@0x740c340: 0,
num_devices=<optimized out>, local_rank=0, local_size=1) at /work/qmcpack/src/Platforms/SYCL/SYCLDeviceManager.cpp:128
#25 0x0000000000d888c9 in qmcplusplus::DeviceManager::DeviceManager (this=0x740c340, local_rank=local_rank@entry=0, local_size=local_size@entry=1)
at /work/qmcpack/src/Platforms/DeviceManager.cpp:32
#26 0x0000000000d889c2 in std::make_unique<qmcplusplus::DeviceManager, int&, int&> (__args=<optimized out>, __args=<optimized out>)
at /usr/lib64/gcc/x86_64-suse-linux/13/../../../../include/c++/13/bits/unique_ptr.h:1070
#27 qmcplusplus::DeviceManager::initializeGlobalDeviceManager (local_rank=0, local_size=1) at /work/qmcpack/src/Platforms/DeviceManager.cpp:55
#28 0x000000000045eb7f in qmcplusplus::QMCMain::QMCMain (this=0x740c720, c=<optimized out>) at /work/qmcpack/src/QMCApp/QMCMain.cpp:74
#29 0x0000000000456fa2 in std::make_unique<qmcplusplus::QMCMain, Communicate*&> (__args=<optimized out>)
at /usr/lib64/gcc/x86_64-suse-linux/13/../../../../include/c++/13/bits/unique_ptr.h:1070
#30 main (argc=<optimized out>, argv=<optimized out>) at /work/qmcpack/src/QMCApp/qmcapp.cpp:191
Further investigation shows that this occurs the first time _dl_resize_dtv is called. It appears that the problem is that glibc is trying to realloc memory that was initially allocated with the minimal allocator rather than the default malloc.
This problem occurs without HPCToolkit. It is a fundamental auditor bug in glibc 2.31 in SUSE 15.4.
The problem arises when using the following trivial auditor:
auditor.c:
#define _GNU_SOURCE
#include <link.h>
// Called when a new object is added to the link map
unsigned int la_version(unsigned int version) {
return LAV_CURRENT;
}
build auditor
gcc -o audit.so -fPIC -shared audit.c