ES log cluster outage (#211) · Issues · GitLab.com / GitLab Infrastructure Team / Production

ES log cluster outage

We got Prometheus alerts at `12:50 UTC`. Cluster is back up and running since `13:03 UTC`. The problem was heavy GC times which caused timeouts on the cluster. This is also probably related with https://gitlab.com/gitlab-com/infrastructure/issues/2277 and https://gitlab.com/gitlab-com/infrastructure/issues/2274. Logstash mostly started indexing too much while we were trying to figure out the issue.

issue