fix a 4.14 segfault on par_perm_count
I spent some time today to understand the cause of a 4.14-specific segfault of par_perm_count. The current MR fixes the issue, and tries to make it easier to spot in the future.
-
The segfault itself is caused by the fact that benchmarks/lib-ref always assumes that
boxroot_createsucceeds, and does nothing specia when it returns NULL instead -- the next access to that root segfault.To detect the issue faster, I added an
assert (b != NULL)in the relevant place. -
The reason why
boxroot_createreturns NULL is that the thread-localbxr_thread_has_lockboolean falsely reported the domain lock as not held.I added debug printing in the benchmark code to report domain lock issues or other known error sources (as documented in boxroot.h).
-
The reason for the false
bxr_thread_has_lockreport was that a thread had been started by OCaml code before boxroot was setup, and was waiting for the domain lock at the point where boxroot was setup. When it got control back, it thus started running with the domain lock, but with the default valuebxr_thread_has_lock = false: it hadn't used the boxroot-specificleave_blocking_sectionhooks which was not installed yet when it called that function.To detect this issue, I changed
bxr_thread_has_lockfrom a boolean to a three-value enum: either we know that the thread holds the domain lock, or that it doesn't, or we don't know anything about the thread (it was started before our hooks were in place) and we should put ourselves in state BOXROOT_INVALID.
The fix I propose to avoid this situation where threads are started before boxroot is initialized is to reinstate boxroot_setup as an explicit initialization function -- which is unnecessary most of the time due to implicit initialization -- and use it in the benchmark before spawning threads. This is nicer, I think, than asking users (here the benchmarks) to allocate and discard a boxroot just for initialization purposes.