Skip to content

Improve indexing using FindChangedPaths and ListBlobs

What does this MR do and why?

Added a new parameter OptimizedPerformance in the taskRequestResponse struct. The rails will send the optimized_performace attribute to the indexer depending on the value of a feature flag zoekt_optimized_performance_indexing. If optimized_performace is true then the indexer will use the EachFileChangeOptimizedPerformance function to index.

There are two main improvements in EachFileChangeOptimizedPerformance.

  • Using FindChangedPathsRequest instead of GetRawChangesRequest. We discard the diffs and use only paths received from GetRawChangesRequest. FindChangedPathsRequest is just returning the paths that remove the overhead of diffs calculation.
  • Collect all the blob_id from FindChangedPathsRequest. Create a hashmap of blob_id as key and the slice of paths as the value of the hashmap. Call the ListBlobs using the blob_id as the revisions in the ListBlobsRequest. Call put for each path corresponds to the revision.

The IndexBatchSize is set to 10000. With a much higher batch size like 30000, I was getting the error something like this: argument list too long, stderr: \"\""}. The 10000 batch size should be pretty safe in all cases. I have verified that with the 10000 batch size the size bytes is much less than 4MB. With SHA1 it is running fine with batch_size of 20000. So I am assuming with batch_size of 10000 SHA256` will be fine.

After reindexing with optimized performance, I did some spot-searching. And the results were the same which confirms the indexing is done successfully.

Screenshots

Before After
Before After

How to set up and validate locally

Search::Zoekt::IndexingTaskService.execute project_id, :force_index_repo
  • Observe the indexTime in the log
  • Turn off the FF zoekt_optimized_performance_indexing. Wait for about ~30s. FF is cached.
  • Perform the same process
  • Observe the indexTime should be much higher in the second case.

Related: gitlab#487328 (closed)

Edited by Ravi Kumar

Merge request reports

Loading