Library performance characteristics make it practically unusuable
Hello,
I have recently started using BareNET as a temporary schema generator for our project, however the performance issues which quickly arose made me push to switch away from it much sooner.
TL;DR:
- Decoding has O(n^2) complexity
- Decoding speed deteriorates to single digit kb/s quickly
- Library code allocates thousands of instances of garbage
- 3 seconds to decode a 27kb payload prevent our team from using this library in our application
Explanation:
Reading even a single u8 will copy the remaining payload, which gets slower with larger payloads (ours is 27kb)
public static (byte,byte[]) Decode_u8(byte[] data)
{
return (data[0], data.Skip(1).ToArray());
}
This brings complexity to O(n^2) and slows down everything significantly.
Unnecessary LINQ usage is also slow
public static (ulong,byte[]) Decode_uint(byte[] data)
{
var rest = data.SkipWhile(Most_siginifanct_bit_set).ToArray();
var value =
data
.TakeWhile(Most_siginifanct_bit_set)
.Concat(rest.Take(1))
.Select((bits, index) => (bits, index))
.Aggregate(0ul, (result, current) => ((ulong)(current.bits & 0x7F) << (7*current.index)) | result);
return (value, rest.Skip(1).ToArray());
}
This code actually becomes shorter when writen in non-LINQ, non-allocating manner:
public static (ulong,Memory<byte>) Decode_uint(Memory<byte> data) {
var span = data.Span;
var result = 0ul;
int bytesRead = 0;
do {
result = ((ulong) (span[bytesRead] & 0x7F) << (7 * bytesRead)) | result;
} while (Most_siginifanct_bit_set(span[bytesRead++]));
return (result, data.Slice(bytesRead));
}
I have spent less than an hour changing Bare.cs
simply by replacing byte[]
with Memory<byte>
and related functions such as data.Skip(4)
with data.Slice(4)
. I also rewrote Decode_uint
as shown. This yielded a 1500x speedup.
Before: 2925ms to decode 27384 bytes of replay
- 9 kb/s
After 2ms to decode 27384 bytes of replay
- 13 mb/s
This was just a quick experiment, so I won't be PRing the code (requires changes to Encode functionality as well as code generation). Using Span instead of Memory and rewriting function signatures to not return tuples would most likely yield even better results.