Skip to content

Draft: Don't retry in gitlab elasticsearch indexer

Related to gitlab#323856 (closed)

This MR updates a setting in the Elastic indexing library to retry no status codes (ie. blank the list of retry-able status codes).

We are regularly seeing these gitlab-elasticsearch-indexer process get stuck running forever and consuming more and more memory. It seems plausible that the retry logic leads to ever increasing memory footprint based on previous related issues #51 (closed) . From what I can tell the only work we did for that issue was to log more information but also @dgruzd diagnosed that a 413 won't retry anyway so in that case it's not clear the 413 was the root cause of the growing memory but it could have been still related to a different retry in there.

In any case I think this retrying logic is probably more harmful than helpful. The reason being that sidekiq itself will already be orchestrating the indexing processes and already handles retries. Keeping the retrying at a higher level will surface more failures to us and allow us to better understand the problem and it will reduce the likelihood of the indexer running for a very long time. The one tradeoff is that the bulk indexer process here is capable of only retrying specific requests that failed as opposed to sidekiq which will potentially reindex the whole project again even if it was partially indexed. I think we can live with that tradeoff though as it will only happen on the first indexing of a project and only under rare circumstances where some payloads fail to index while others index fine.

Merge request reports