Library performance characteristics make it practically unusuable

Hello,

I have recently started using BareNET as a temporary schema generator for our project, however the performance issues which quickly arose made me push to switch away from it much sooner.

TL;DR:

Decoding has O(n^2) complexity
Decoding speed deteriorates to single digit kb/s quickly
Library code allocates thousands of instances of garbage
3 seconds to decode a 27kb payload prevent our team from using this library in our application

Explanation:

Reading even a single u8 will copy the remaining payload, which gets slower with larger payloads (ours is 27kb)

        public static (byte,byte[]) Decode_u8(byte[] data)
        {
            return (data[0], data.Skip(1).ToArray());
        }

This brings complexity to O(n^2) and slows down everything significantly.

Unnecessary LINQ usage is also slow


        public static (ulong,byte[]) Decode_uint(byte[] data)
        {
            var rest = data.SkipWhile(Most_siginifanct_bit_set).ToArray();
            var value =
                data
                    .TakeWhile(Most_siginifanct_bit_set)
                    .Concat(rest.Take(1))
                    .Select((bits, index) => (bits, index))
                    .Aggregate(0ul, (result, current) => ((ulong)(current.bits & 0x7F) << (7*current.index)) | result);
            return (value, rest.Skip(1).ToArray());
        }

This code actually becomes shorter when writen in non-LINQ, non-allocating manner:

        public static (ulong,Memory<byte>) Decode_uint(Memory<byte> data) {
            var span = data.Span;
            var result = 0ul;
            int bytesRead = 0;

            do {
                result = ((ulong) (span[bytesRead] & 0x7F) << (7 * bytesRead)) | result;
            } while (Most_siginifanct_bit_set(span[bytesRead++]));

            return (result, data.Slice(bytesRead));
        }

I have spent less than an hour changing Bare.cs simply by replacing byte[] with Memory<byte> and related functions such as data.Skip(4) with data.Slice(4). I also rewrote Decode_uint as shown. This yielded a 1500x speedup.

Before: 2925ms to decode 27384 bytes of replay - 9 kb/s

After 2ms to decode 27384 bytes of replay - 13 mb/s

This was just a quick experiment, so I won't be PRing the code (requires changes to Encode functionality as well as code generation). Using Span instead of Memory and rewriting function signatures to not return tuples would most likely yield even better results.

Edited Oct 06, 2023 by Kirill Artemov

To upload designs, you'll need to enable LFS and have an admin enable hashed storage. More information