Crash due to finalizing weak pointer to cl_symbols with ECL_WEAK_HASH
ECL (both 21.2.1 and develop
branch as of ae19006cb
) crash when loading SLIME 2.26.1 while starting vlime on FreeBSD 13 on amd64.
The crash usually takes a while to happen after the fasl's are compiled for the first time, but happens immediately after loading SLIME after being compiled.
The backtrace (with system boehm-gc 8.2.2 without assertions) looks something like:
#0 0x0000000800636f3b in sigsegv_handler (sig=sig@entry=11, info=info@entry=0x7fffde7f0470, aux=aux@entry=0x7fffde7f0100)
at /home/kevinz/workspace/ecl/src/c/unixint.d:853
#1 0x000000080025fe0e in handle_signal (actp=actp@entry=0x7fffde7f0080, sig=sig@entry=11, info=info@entry=0x7fffde7f0470,
ucp=ucp@entry=0x7fffde7f0100) at /usr/src/lib/libthr/thread/thr_sig.c:301
#2 0x000000080025f3cf in thr_sighandler (sig=11, info=0x7fffde7f0470, _ucp=0x7fffde7f0100)
at /usr/src/lib/libthr/thread/thr_sig.c:246
#3 <signal handler called>
#4 0x000000080029ad25 in GC_is_marked (p=0x800698820 <cl_symbols+114816>) at mark.c:231
#5 0x00000008002956e5 in GC_make_disappearing_links_disappear (dl_hashtbl=0x8002afaa0 <GC_dl_hashtbl>, is_remove_dangling=0)
at finalize.c:972
#6 0x0000000800294e42 in GC_finalize () at finalize.c:1020
#7 0x000000080028d9c3 in GC_finish_collection () at alloc.c:1045
#8 0x000000080028d3ba in GC_try_to_collect_inner (stop_func=0x80028ca60 <GC_never_stop_func>) at alloc.c:553
#9 0x000000080028ee9d in GC_collect_or_expand (needed_blocks=72, ignore_off_page=1, retry=0) at alloc.c:1443
#10 0x000000080029826e in GC_alloc_large (lb=294912, k=0, flags=1) at malloc.c:64
#11 0x0000000800299b17 in GC_generic_malloc_ignore_off_page (lb=294912, k=0) at mallocx.c:218
#12 0x0000000800299c97 in GC_malloc_atomic_ignore_off_page (lb=294912) at mallocx.c:253
#13 0x0000000800650530 in ecl_alloc_atomic_unprotected (n=294912) at /home/kevinz/workspace/ecl/src/c/alloc_2.d:694
#14 ecl_alloc_atomic (n=34366652416) at /home/kevinz/workspace/ecl/src/c/alloc_2.d:714
#15 0x0000000800635700 in init_stacks (env=env@entry=0x805f4b000) at /home/kevinz/workspace/ecl/src/c/stacks.d:717
After recompiling bdwgc master
with --enable-gc-assertions --disable-thread-local-alloc --disable-parallel-mark --disable-munmap
, the backtrace looks like:
#0 0x000000080096a33a in thr_kill () from /lib/libc.so.7
#1 0x00000008008e2c74 in raise () from /lib/libc.so.7
#2 0x0000000800994109 in abort () from /lib/libc.so.7
#3 0x0000000800663620 in GC_register_disappearing_link_inner () from /opt/ecl/lib//libecl.so.21.2
#4 0x0000000800663514 in GC_general_register_disappearing_link () from /opt/ecl/lib//libecl.so.21.2
#5 0x000000080065aee2 in ecl_alloc_weak_pointer () from /opt/ecl/lib//libecl.so.21.2
#6 0x000000080065af15 in si_make_weak_pointer () from /opt/ecl/lib//libecl.so.21.2
#7 0x000000080064332f in _ecl_sethash_weak () from /opt/ecl/lib//libecl.so.21.2
#8 0x000000080064229d in ecl_sethash () from /opt/ecl/lib//libecl.so.21.2
#9 0x0000000800644fe5 in si_hash_set () from /opt/ecl/lib//libecl.so.21.2
#10 0x00000008050d45a0 in L9form_to_json ()
from /var/cache/usercache/common-lisp/ecl-21.2.1-03ff4c16-bsd-x64/.vim/pack/start/vlime/lisp/src/vlime
-protocol.fas
In a separate debug build (not shown here), I saw that the obj field in frame 4 pointed to somewhere in the cl_symbols
array. I suspect that bdwgc can only register disappearing links to objects allocated on the heap, and am waiting for clarification on this issue here: https://github.com/ivmai/bdwgc/issues/415
I was able to work around this issue and successfully load and run vlime/slime after removing this line from src/h/config-internal.h.in:
#define ECL_WEAK_HASH