message-passing benchmark: a multi-producer multi-consumer version
This MR sits on top of !102 (merged), it extends the message-passing benchmark to support multiple producers and multiple consumers. In the SPSC version, a simple semaphore was used to synchronize the producer and consumers every max_delay round; now all producers and consumers enter a barrier implemented with a condition variable (so this is more costly).
The producers all push to single shared queue, and the consumers all pop from this queue. The snmalloc benchmark uses a queue for each consumer (and producers pushing to a random consumer), but I don't see a reason to prefer one or the other design.
The results are a bit different from what I observed with the previous version of the benchmark, in particular I observe configurations where boxroot is 2x slower than ocaml-ref instead of being faster, but this seems fairly sensitive to benchmark parameters (in particular the choices of MAX_DELAY and BATCH_SIZE). Overall the qualitative results look similar -- and in particular identical when N_PRODUCERS=1, N_CONSUMERS=1, which is at it should be.
I tried with N_PRODUCERS=2 N_CONSUMERS=2, it looks like the benchmark scales reasonably well. (Each producer pushes one value per consumer, so this setting has 4x the total workload and 2x the per-consumer workload, and it seems to run in 2x the time for the ideal ocaml implementation, and 2.7x the time for boxroot.)