Monitor performance of GitLab.com's Elasticsearch clusters
We have ElasticSearch clusters for staging and production. As ES cluster performance greatly affects the performance of ES indexing and search, we should monitor their vital statistics and display them in a grafana dashboard, similar to the postgresql one: https://performance.gitlab.net/dashboard/db/postgres-stats?var-environment=staging&from=now-1h&to=now
A quick search suggests https://github.com/Braedon/prometheus-es-exporter , but I've never used it (or any alternatives). This package seems to lack support for AWS-signed clusters (and hardcodes verify_certs=false), but it's not a lot of code.
I think we have a choice between deploying the monitoring daemon as part of GitLab.com's infrastructure, or trying to get it into the omnibus packages so it can be of use to all our customers. It wouldn't be difficult to bundle it in gitlab EE only, then automatically enable it if prometheus and elasticsearch server details are both present. Which way do we want to go? How do we manage the equivalent postgres dashboards at the moment? @maratkalibek @marin @bjk-gitlab ?
ES cluster performance seems to be heavily predicated on underlying JVM considerations, especially memory pressure, so we should monitor GC statistics, etc. Here's a snapshot of the kinds of things the AWS ES service monitors: