2020-05-05: GPRD Kibana returning 502
Summary
Kibana at https://log.gprd.gitlab.net/ is returning 502. The underlying UI is rendering, but the health checks are failing because the Elastic backend is down.
More information will be added as we investigate the issue.
Internal Grafana dashboard: https://dashboards.gitlab.net/d/logging-main/logging-overview?orgId=1&from=now-3h&to=now
Elastic Cloud dashboard view
Kibana via direct link
Timeline
All times UTC.
05-05-2020
- 18:11 - Grafana shows that elastic cluster exporter metrics stopped reporting.
- 18:42 - EOC was notified via Slack that Kibana wasn't working.
- 19:13 - The dashboard is flapping an Unhealthy deployment warning and maintenance warning on the cloud.elastic.co dashboard.
- 19:15 - We've determined there's an issue with the IAP and load balancer health checks. The Kibana cluster is accessible via direct link. Please see
#incident-management
Slack channel for details. - 19:19 - Kibana status is showing red due to failures with the ElasticSearch plugin.
- 19:24 - Elastic support case 00531280 opened.
- 19:32 - @AnthonySandoval sent a message to our Elastic rep in Slack bring the case to his attention, hoping perhaps for an expedited response.
- 19:45 - Our Elastic rep responded indicating that he'll be investigating shortly.
- 20:13 - IMOC (@AnthonySandoval) is engaging directly via DM in Slack with our Elastic rep.
- 20:21 - Elastic cloud dashboard is continuing to flap between red and yellow health warnings.
- 20:33 - Cluster exporter is reporting metrics.
- 20:40 - Kibana loadbalancer healthchecks confirmed are passing; search UI loads successfully.
Resources
- If the Situation Zoom room was utilised, recording will be automatically uploaded to Incident room Google Drive folder (private)
Edited by AnthonySandoval