Skip to content

Crash due to finalizing weak pointer to cl_symbols with ECL_WEAK_HASH

ECL (both 21.2.1 and develop branch as of ae19006cb) crash when loading SLIME 2.26.1 while starting vlime on FreeBSD 13 on amd64.

The crash usually takes a while to happen after the fasl's are compiled for the first time, but happens immediately after loading SLIME after being compiled.

The backtrace (with system boehm-gc 8.2.2 without assertions) looks something like:

#0  0x0000000800636f3b in sigsegv_handler (sig=sig@entry=11, info=info@entry=0x7fffde7f0470, aux=aux@entry=0x7fffde7f0100)
    at /home/kevinz/workspace/ecl/src/c/unixint.d:853
#1  0x000000080025fe0e in handle_signal (actp=actp@entry=0x7fffde7f0080, sig=sig@entry=11, info=info@entry=0x7fffde7f0470, 
    ucp=ucp@entry=0x7fffde7f0100) at /usr/src/lib/libthr/thread/thr_sig.c:301
#2  0x000000080025f3cf in thr_sighandler (sig=11, info=0x7fffde7f0470, _ucp=0x7fffde7f0100)
    at /usr/src/lib/libthr/thread/thr_sig.c:246
#3  <signal handler called>
#4  0x000000080029ad25 in GC_is_marked (p=0x800698820 <cl_symbols+114816>) at mark.c:231
#5  0x00000008002956e5 in GC_make_disappearing_links_disappear (dl_hashtbl=0x8002afaa0 <GC_dl_hashtbl>, is_remove_dangling=0)
    at finalize.c:972
#6  0x0000000800294e42 in GC_finalize () at finalize.c:1020
#7  0x000000080028d9c3 in GC_finish_collection () at alloc.c:1045
#8  0x000000080028d3ba in GC_try_to_collect_inner (stop_func=0x80028ca60 <GC_never_stop_func>) at alloc.c:553
#9  0x000000080028ee9d in GC_collect_or_expand (needed_blocks=72, ignore_off_page=1, retry=0) at alloc.c:1443
#10 0x000000080029826e in GC_alloc_large (lb=294912, k=0, flags=1) at malloc.c:64
#11 0x0000000800299b17 in GC_generic_malloc_ignore_off_page (lb=294912, k=0) at mallocx.c:218
#12 0x0000000800299c97 in GC_malloc_atomic_ignore_off_page (lb=294912) at mallocx.c:253
#13 0x0000000800650530 in ecl_alloc_atomic_unprotected (n=294912) at /home/kevinz/workspace/ecl/src/c/alloc_2.d:694
#14 ecl_alloc_atomic (n=34366652416) at /home/kevinz/workspace/ecl/src/c/alloc_2.d:714
#15 0x0000000800635700 in init_stacks (env=env@entry=0x805f4b000) at /home/kevinz/workspace/ecl/src/c/stacks.d:717

After recompiling bdwgc master with --enable-gc-assertions --disable-thread-local-alloc --disable-parallel-mark --disable-munmap, the backtrace looks like:

#0  0x000000080096a33a in thr_kill () from /lib/libc.so.7
#1  0x00000008008e2c74 in raise () from /lib/libc.so.7
#2  0x0000000800994109 in abort () from /lib/libc.so.7
#3  0x0000000800663620 in GC_register_disappearing_link_inner () from /opt/ecl/lib//libecl.so.21.2
#4  0x0000000800663514 in GC_general_register_disappearing_link () from /opt/ecl/lib//libecl.so.21.2
#5  0x000000080065aee2 in ecl_alloc_weak_pointer () from /opt/ecl/lib//libecl.so.21.2
#6  0x000000080065af15 in si_make_weak_pointer () from /opt/ecl/lib//libecl.so.21.2
#7  0x000000080064332f in _ecl_sethash_weak () from /opt/ecl/lib//libecl.so.21.2
#8  0x000000080064229d in ecl_sethash () from /opt/ecl/lib//libecl.so.21.2
#9  0x0000000800644fe5 in si_hash_set () from /opt/ecl/lib//libecl.so.21.2
#10 0x00000008050d45a0 in L9form_to_json ()
   from /var/cache/usercache/common-lisp/ecl-21.2.1-03ff4c16-bsd-x64/.vim/pack/start/vlime/lisp/src/vlime
-protocol.fas

In a separate debug build (not shown here), I saw that the obj field in frame 4 pointed to somewhere in the cl_symbols array. I suspect that bdwgc can only register disappearing links to objects allocated on the heap, and am waiting for clarification on this issue here: https://github.com/ivmai/bdwgc/issues/415

I was able to work around this issue and successfully load and run vlime/slime after removing this line from src/h/config-internal.h.in:

#define ECL_WEAK_HASH