Some very rough, but very fast improvements (!1) · Merge requests · se open source / oniongen-rs

M*C*O requested to merge MCOfficer/oniongen-rs:speeed into main Feb 23, 2023

Hello,

I was tinkering around with your project today, trying to get as much speed out of it as possible for shits and giggles. And while the outcome is still rough around the edges, I feel like it's too good not to share. Even if this is not up to standards, it should at least give you some ideas on your biggest bottlenecks.

Changes

Add a spinner to get an approximate speed measurement. This is the very first commit, so one can get a number to compare my following changes against. Note that this first commit still uses channels instead of atomics, but in flamegraphs the performance impact of channels was almost negligible (~5%).
Switch to ed25519-dalek. This is without a doubt the biggest kicker, easily tripling performance by itself, and even more when using the SIMD backend. Unfortunately the 2.0 version is still in prereleases, so you might want to wait a bit before fully investing into it.
Use atomics for the spinner, to eliminate the aforementioned 5%.
Enable LTO - I failed to measure any meaningful impact on my system, but it can't hurt.
Keep going even after finding an address - this is definitely personal preference, but afaik most people prefer to generate a heap of addresses and then select the best (instead of the first).

Other things I considered

I thought about using other RNGs or more performant address matching than regex, but both are negligible according to flamegraphs.

Furthermore, I tried core pinning via core_affinity and using jemallocator, both with no noticable impact.

Results

I tested on a 5900X within WSL2, with the following RUSTLFAGS: "-Ctarget_cpu=native" '--cfg=curve25519_dalek_backend="simd"'

With all said and done, on my machine, MCOfficer/oniongen-rs@9eadf0e2 ran at just over 200k computations per second, against MCOfficer/oniongen-rs@966376d9's 1 million.

I've attached the latest flamegraphs for your information:

Some very rough, but very fast improvements

Changes

Other things I considered

Results

Merge request reports