IngressModsecurityCounterMetricsWorker is not collecting any data from clusters
Summary
There is a problem with advanced metrics from WAF. To calculate them we need to connect to Elasticsearch instance in the cluster and fetch the data from it, however in Sentry we see multiple problems with getting that data and because of raised exceptions we were not able to collect them properly. In scope of this issue we have to ensure that whenever there is a problem with connecting with cluster we are continuing with calculation for other clusters.
External links
https://sentry.gitlab.net/gitlab/gitlabcom/issues/1674632/?referrer=gitlab_plugin and https://sentry.gitlab.net/gitlab/gitlabcom/?query=IngressModsecurityCounterMetricsWorker
Steps to reproduce
- Configure your Cluster in GitLab (install Ingress with Modsecurity enabled and Elasticsearch)
- Go to Kubernetes console and manually remove/modify Elasticsearch Deployment to make it unaccessible for the worker.
- Manually (through
rails console
) runIngressModsecurityCounterMetricsWorker
- You should see raised exception instead of successful execution.
Example Project
https://gitlab.com/gitlab-org/gitlab/
What is the current bug behavior?
When connection to Elasticsearch in the Cluster is not possible and the exception is raised during statistics calculation, the calculation does not finish successfully and statistics are not saved.
What is the expected correct behavior?
When connection to Elasticsearch in the Cluster is not possible and the exception is raised during statistics calculation, the calculation continues and statistics are saved.
Relevant logs and/or screenshots
Elasticsearch::Transport::Transport::Errors::ServiceUnavailable: [503] {"kind":"Status","apiVersion":"v1","metadata":{},"status":"Failure","message":"no endpoints available for service \"elastic-stack-elasticsearch-client\"","reason":"ServiceUnavailable","code":503}
elasticsearch/transport/transport/base.rb:205:in `__raise_transport_error'
raise error.new "[#{response.status}] #{response.body}"
elasticsearch/transport/transport/base.rb:323:in `perform_request'
__raise_transport_error response unless ignore.include?(response.status.to_i)
elasticsearch/transport/transport/http/faraday.rb:20:in `perform_request'
super do |connection, url|
elasticsearch/transport/client.rb:143:in `perform_request'
transport.perform_request(method, path, params, body, headers)
gitlab/instrumentation/elasticsearch_transport.rb:10:in `perform_request'
super
...
(99 additional frame(s) were not displayed)
Elasticsearch::Transport::Transport::Errors::ServiceUnavailable: [503] {"kind":"Status","apiVersion":"v1","metadata":{},"status":"Failure","message":"no endpoints available for service \"elastic-stack-elasticsearch-client\"","reason":"ServiceUnavailable","code":503}
Output of checks
This bug happens on GitLab.com
Possible fixes
- Rescue from any exception during statistics calculation (https://gitlab.com/gitlab-org/gitlab/blob/master/ee/app/services/ee/security/ingress_modsecurity_usage_service.rb#L49)