chunker: Memory and FFI performance improvements
Context
From @dgruzd on !686 (comment 2689642236):
We can also optimize the code parser by introducing a bulk function, for example
AddFiles. The most expensive thing with FFI is probably crossing the FFI boundary. We callAddFilefor each file, which slows down the process.AddFileswould probably be a simple for loop in the Rust code. We can experiment with that later though🤝
From @proglottis on !686 (comment 2691425098)
AddFiledoes not cross the FFI boundary. It's all written in go, allocating C types in go memory. See https://gitlab.com/gitlab-org/rust/gitlab-code-parser/-/blob/1afa14bb53027bd105d44800c426f2248746ac21/bindings/go/chunker/chunker.go#L102It's actually the enumeration that is worst for number of FFI calls due to the go style enumeration callback. See https://gitlab.com/gitlab-org/rust/gitlab-code-parser/-/blob/1afa14bb53027bd105d44800c426f2248746ac21/bindings/go/chunker/chunker.go#L161 - We should do some benchmarking, but I think this could be improved by using a fixed size buffer instead of one-by-one.
Another possible improvement that we could implement is to create the code chunker once and use
Clear()between invocations instead of closing and recreating it. This will mean that the chunk vec allocated in rust will be reused (this is the array that is accessed by the go style enumeration). The struct in this vec isn't actually that large as it mostly has pointers back to go memory, but still allocating memory has some cost.
> non-blocking, we can consider:
>
> ```rust
> let mut chunks = Vec::new(); // Starts with capacity 0, grows as needed
> ```
>
> Could estimate capacity:
> ```rust
> let estimated_chunks = (source_code.len() / self.max_chunk_size) + 1;
> let mut chunks = Vec::with_capacity(estimated_chunks);
> ```