Some very rough, but very fast improvements
Hello,
I was tinkering around with your project today, trying to get as much speed out of it as possible for shits and giggles. And while the outcome is still rough around the edges, I feel like it's too good not to share. Even if this is not up to standards, it should at least give you some ideas on your biggest bottlenecks.
Changes
- Add a spinner to get an approximate speed measurement. This is the very first commit, so one can get a number to compare my following changes against. Note that this first commit still uses channels instead of atomics, but in flamegraphs the performance impact of channels was almost negligible (~5%).
- Switch to
ed25519-dalek
. This is without a doubt the biggest kicker, easily tripling performance by itself, and even more when using the SIMD backend. Unfortunately the 2.0 version is still in prereleases, so you might want to wait a bit before fully investing into it. - Use atomics for the spinner, to eliminate the aforementioned 5%.
- Enable LTO - I failed to measure any meaningful impact on my system, but it can't hurt.
- Keep going even after finding an address - this is definitely personal preference, but afaik most people prefer to generate a heap of addresses and then select the best (instead of the first).
Other things I considered
I thought about using other RNGs or more performant address matching than regex, but both are negligible according to flamegraphs.
Furthermore, I tried core pinning via core_affinity
and using jemallocator
, both with no noticable impact.
Results
I tested on a 5900X within WSL2, with the following RUSTLFAGS:
"-Ctarget_cpu=native"
'--cfg=curve25519_dalek_backend="simd"'
With all said and done, on my machine, MCOfficer/oniongen-rs@9eadf0e2 ran at just over 200k computations per second, against MCOfficer/oniongen-rs@966376d9's 1 million.