Zoekt indexer is slow during initial indexing

Summary

We've encountered extremely slow initial indexing of gitlab-org/gitlab (30+ minutes) in Zoekt

Steps to reproduce

What is the current bug behavior?

What is the expected correct behavior?

Relevant logs and/or screenshots

Possible fixes

The indexer is using GetRawChanges here. That's performing a diff but the indexer discards the diff content and instead fetches the entire blob afterwards. You could use FindChangedPaths instead which returns you just the tree diff (added/modified/removed) paths without the blob changes that you're discarding. In addition, the blobs are fetched one by one in a loop which adds network roundtrip for every single blob that is part of the change. You might want to consider batching the requests and getting multiple blobs with a single GetBlobs request instead. You could also consider pipelining the indexer, and allowing the blob fetching and indexing (in put ) happen concurrently.

Implement the changes with the FF

Edited by Ravi Kumar