Optimize read_base64_from_bytes
Also add a benchmark for testing base64 decoding.
This are the results I got locally after the optimizing commit:
read_base64_from_bytes/128 bytes
time: [407.62 ns 408.70 ns 410.15 ns]
change: [-37.277% -36.237% -35.505%] (p = 0.00 < 0.05)
Performance has improved.
Found 13 outliers among 100 measurements (13.00%)
4 (4.00%) high mild
9 (9.00%) high severe
read_base64_from_bytes/1kb
time: [2.8940 µs 2.9476 µs 3.0225 µs]
change: [-79.861% -79.279% -78.591%] (p = 0.00 < 0.05)
Performance has improved.
Found 7 outliers among 100 measurements (7.00%)
1 (1.00%) high mild
6 (6.00%) high severe
read_base64_from_bytes/300kb
time: [3.6385 ms 3.6396 ms 3.6413 ms]
change: [-27.170% -27.132% -27.095%] (p = 0.00 < 0.05)
Performance has improved.
Found 6 outliers among 100 measurements (6.00%)
5 (5.00%) high mild
1 (1.00%) high severe
Also, as another "proof" (obviously very subjective), here is the generated assembly for before and after the optimization: https://gitlab.com/-/snippets/4902313. The old version is almost twice as many instructions, and contains a loooot of conditional branching instructions.
I believe this will show up on rustc perf too.