2020-06-22: Elasticsearch cluster not responding around 17:00 UTC
-
[17:00] We have seen 500 errors when querying on GitLab.com, and Sentry logs recorded a lot of exceptions from
ElasticCommitIndexerWorker
, for example https://sentry.gitlab.net/gitlab/gitlabcom/issues/1676108/events/31587452/ . From our discussion on Slack, it could be because of initial indexing jobs launched by #2307 (closed) . Please see discussion at https://gitlab.slack.com/archives/C3TMLK465/p1592845228376600. -
[17:16] An issue was created, gitlab-org/gitlab#223756 (closed)
-
[17:26] SRE was notified. https://gitlab.slack.com/archives/C101F3796/p1592846811423200
-
[17:29] An issue was created to check the workload on Sidekiq workers, https://gitlab.com/gitlab-com/gl-infra/infrastructure/-/issues/10634 . And it seems the concurrency of Sidekiq workers was low enough.
-
[18:06] There was a period of time, we were able to reach Elasticsearch cluster from our monitoring Kibana instance.
-
[18:06] Another production change request was created to pause the indexing, #2314 (closed)
-
[18:17] The accessibility to Elasticsearch cluster was back and we saw CPU utilization was much lower.
**Note, the time is in UTC.