ES log cluster outage

We got Prometheus alerts at 12:50 UTC.

Cluster is back up and running since 13:03 UTC.

The problem was heavy GC times which caused timeouts on the cluster. This is also probably related with https://gitlab.com/gitlab-com/infrastructure/issues/2277 and https://gitlab.com/gitlab-com/infrastructure/issues/2274.

Logstash mostly started indexing too much while we were trying to figure out the issue.