Support sending which branches to index
Recently the rails code that kicks off the Elasticsearch indexer was changed to use root_ref
in place of HEAD
in gitlab!122912 (merged)
The zoekt indexer currently uses default branch but we'd like to support any branch requested through the API
Technical details
- Zoekt raises an error if the indexed metadata is different than what is being requested - this includes branch set. A delta build is not allowed in that case and the caller is supposed to check and determine that before indexing is requested.
- Zoekt delta builds determine if files are modified and deleted across ALL branches and updates them accordingly. We need to replicate that behavior when telling Zoekt which files are updated during indexing. This is especially important if a file is modified or deleted in one branch only.
- We need to avoid making calls to gitaly for file contents multiple times and should not build any in memory structure that includes data contents
Proposal
- Add ability to send in indexing targets (branch (required) and SHA (optional)) in an array to the API (started in !45 (closed))
- Build an array of files changed in memory from the indexing target list. This should be done before pulling the file contents from gitaly.
- For each branch, we should look at the diffs between the indexed SHA (Zoekt side) to the target SHA (HEAD or specified SHA)_
- Explore gitaly protocol documentation and collaborate with that team to find the best way to build the structure below. Structure should contain:
- file path
- SHA
- branch list
- operation?
- Modified and Deleted files will need to be handled to make sure the correct branches are sent to Zoekt
- Calculate whether a delta build is possible prior to indexing by comparing index target list to Zoekt metadata
Edited by Terri Chu