State which metric threshold was crossed on deploy board

Problem to solve

When we deliver #214634 (closed) , the user will know that there is a problem with the deployment and that a threshold was crossed. We should also add info as to which threshold was crossed.

Intended users

Further details

n case there is a degradation in performance or quality, we will notify the user on the deploy board (environment page) so that they will know something is wrong and can take action.

Using the existing Prometheus API we will query the current threshold of error rates

Proposal

Next to the threshold crossed message on the deploy board developed in When we deliver #214634 (closed), the actual threshold that was crossed should be stated.

This iteration should state the actual error that was exceeded.

These are the supported metrics:

Name	Query
Throughput (req/sec)	sum(label_replace(rate(nginx_ingress_controller_requests{namespace="%{kube_namespace}",ingress=~".%{ci_environment_slug}."}[2m]), "status_code", "${1}xx", "status", "(.)..")) by (status_code)
Latency (ms)	sum(rate(nginx_ingress_controller_ingress_upstream_latency_seconds_sum{namespace="%{kube_namespace}",ingress=~".%{ci_environment_slug}."}[2m])) / sum(rate(nginx_ingress_controller_ingress_upstream_latency_seconds_count{namespace="%{kube_namespace}",ingress=~".%{ci_environment_slug}."}[2m])) * 1000
HTTP Error Rate (%)	sum(rate(nginx_ingress_controller_requests{status=~"5.",namespace="%{kube_namespace}",ingress=~".%{ci_environment_slug}."}[2m])) / sum(rate(nginx_ingress_controller_requests{namespace="%{kube_namespace}",ingress=~".%{ci_environment_slug}."}[2m])) 100

Permissions and Security

Documentation

Availability & Testing

What does success look like, and how can we measure that?

What is the type of buyer?

Is this a cross-stage feature?

Links / references

https://docs.gitlab.com/ee/user/project/integrations/prometheus_library/nginx_ingress.html#metrics-supported

Edited Apr 19, 2020 by Orit Golowinski