Draft: Don't retry in gitlab elasticsearch indexer (!102) · Merge requests · GitLab.org / gitlab-elasticsearch-indexer

Dylan Griffith requested to merge 323856-dont-retry-in-gitlab-elasticsearch-indexer into main Mar 09, 2021

This MR updates a setting in the Elastic indexing library to retry no status codes (ie. blank the list of retry-able status codes).

We are regularly seeing these gitlab-elasticsearch-indexer process get stuck running forever and consuming more and more memory. It seems plausible that the retry logic leads to ever increasing memory footprint based on previous related issues #51 (closed) . From what I can tell the only work we did for that issue was to log more information but also @dgruzd diagnosed that a 413 won't retry anyway so in that case it's not clear the 413 was the root cause of the growing memory but it could have been still related to a different retry in there.

In any case I think this retrying logic is probably more harmful than helpful. The reason being that sidekiq itself will already be orchestrating the indexing processes and already handles retries. Keeping the retrying at a higher level will surface more failures to us and allow us to better understand the problem and it will reduce the likelihood of the indexer running for a very long time. The one tradeoff is that the bulk indexer process here is capable of only retrying specific requests that failed as opposed to sidekiq which will potentially reindex the whole project again even if it was partially indexed. I think we can live with that tradeoff though as it will only happen on the first indexing of a project and only under rare circumstances where some payloads fail to index while others index fine.

Draft: Don't retry in gitlab elasticsearch indexer

Merge request reports