Skip to content

fix a 4.14 segfault on par_perm_count

I spent some time today to understand the cause of a 4.14-specific segfault of par_perm_count. The current MR fixes the issue, and tries to make it easier to spot in the future.

  1. The segfault itself is caused by the fact that benchmarks/lib-ref always assumes that boxroot_create succeeds, and does nothing specia when it returns NULL instead -- the next access to that root segfault.

    To detect the issue faster, I added an assert (b != NULL) in the relevant place.

  2. The reason why boxroot_create returns NULL is that the thread-local bxr_thread_has_lock boolean falsely reported the domain lock as not held.

    I added debug printing in the benchmark code to report domain lock issues or other known error sources (as documented in boxroot.h).

  3. The reason for the false bxr_thread_has_lock report was that a thread had been started by OCaml code before boxroot was setup, and was waiting for the domain lock at the point where boxroot was setup. When it got control back, it thus started running with the domain lock, but with the default value bxr_thread_has_lock = false: it hadn't used the boxroot-specific leave_blocking_section hooks which was not installed yet when it called that function.

    To detect this issue, I changed bxr_thread_has_lock from a boolean to a three-value enum: either we know that the thread holds the domain lock, or that it doesn't, or we don't know anything about the thread (it was started before our hooks were in place) and we should put ourselves in state BOXROOT_INVALID.

The fix I propose to avoid this situation where threads are started before boxroot is initialized is to reinstate boxroot_setup as an explicit initialization function -- which is unnecessary most of the time due to implicit initialization -- and use it in the benchmark before spawning threads. This is nicer, I think, than asking users (here the benchmarks) to allocate and discard a boxroot just for initialization purposes.

Edited by Gabriel Scherer

Merge request reports

Loading