Make elasticsearch blob index idempotent
Currently when we index blobs, we store the content in our Elasticsearch index.
If we were to try to backfill some commits because of an indexing gap, where newer commits have already been indexed, we could end up overriding a blob with stale contents.
Example:
Let's say we have a file, a.txt, that's been indexed
a.txt
this is some text
and this is some more text
and even more!
Let's say that we for some reason would want to index some old commits (because they're not in the index) and one of those commits has a.txt but with the following old contents:
this is some text
and some more text
If we were to run our elasticsearch indexer, then the blob content in the index would be the old content.
We should consider either:
- using optimistic concurrency control to avoid overriding content or
- asking the git repo for the newest version of blobs
This issue stems from https://gitlab.com/gitlab-org/gitlab-ee/issues/8013 - why does a later index operation completely negate an earlier one?