Optimize performance for single-thread use (!13) · Merge requests · Hanna Czenczek / libblkio-async

Historically, it hasn’t been perfectly clear how libblkio-async is to be used, i.e. whether an AsyncBlkioq can be shared/moved between threads, and whether requests can be moved to different threads (which is what e.g. tokio would automatically do when using its thread pools).

It has since turned out that these small requests are generally so fast that moving them to a different thread only makes performance worse. Instead, if we want multithreading, we should have multiple queues (one per thread). Therefore, in 82437d2a, we have dropped the Send implementation for requests (and basically decided that a single AsyncBlkioq should also not be shared between threads).

With AsyncBlkioq used only from a single thread, we can drop a lot of multi-thread safeguards that cost performance (inspired by issue #9 (closed)):

Arc<T> can be replaced by Rc<T>
Mutex<T> can be replaced by RefCell<T> or Cell<T>
Atomics can be replaced by Cell<T>
The CFD waiters list need not have a lock-free thread-safe implementation for enqueuing/dequeuing

This merge request implements these changes, which from preliminary testing seems to improve performance from around 80 to 85 % of raw libblkio to something like 92 %. cargo flamegraph (from a quick glance) seems to indicate that the remainder is spent on malloc()/free()[1] and general async overhead (e.g. dropping the Arc that is internal to tokio’s waker), which seems like this is basically as well as we can reasonably do.

[1] Two uses: One comes from the RequestState object; the other comes from the bench program, which allocates space for the futures, and so isn’t quite libblkio-async’s fault – maybe that’s something that can be improved upon

Optimize performance for single-thread use

Merge request reports