Spike: Determine why reindex fails with "Node not connected" so often
Experienced in gitlab-com/gl-infra/production#2408 (comment 387249996) at least 3 times.
There is an open Elastic Support case to https://support.elastic.co/customers/s/case/5004M00000eAtbNQAS determine why this was happening. We will continue the communication on there to attempt to determine the root cause.
We should try to figure out a way to make it more robust. If this is just unavoidable and isn't likely to be soon fixed in Elasticsearch itself then we may consider building the retries into our reindex feature in GitLab to make our lives easier. It seemed that retrying the failed slices is pretty straightforward and did eventually result in success.
Edited by John McGuire