State which metric threshold was crossed on deploy board
Problem to solve
When we deliver #214634 (closed) , the user will know that there is a problem with the deployment and that a threshold was crossed. We should also add info as to which threshold was crossed.
Intended users
Further details
n case there is a degradation in performance or quality, we will notify the user on the deploy board (environment page) so that they will know something is wrong and can take action.
Using the existing Prometheus API we will query the current threshold of error rates
Proposal
Next to the threshold crossed message on the deploy board developed in When we deliver #214634 (closed), the actual threshold that was crossed should be stated.
This iteration should state the actual error that was exceeded.
These are the supported metrics:
Name | Query |
---|---|
Throughput (req/sec) | sum(label_replace(rate(nginx_ingress_controller_requests{namespace="%{kube_namespace}",ingress=~".%{ci_environment_slug}."}[2m]), "status_code", "${1}xx", "status", "(.)..")) by (status_code) |
Latency (ms) | sum(rate(nginx_ingress_controller_ingress_upstream_latency_seconds_sum{namespace="%{kube_namespace}",ingress=~".%{ci_environment_slug}."}[2m])) / sum(rate(nginx_ingress_controller_ingress_upstream_latency_seconds_count{namespace="%{kube_namespace}",ingress=~".%{ci_environment_slug}."}[2m])) * 1000 |
HTTP Error Rate (%) | sum(rate(nginx_ingress_controller_requests{status=~"5.",namespace="%{kube_namespace}",ingress=~".%{ci_environment_slug}."}[2m])) / sum(rate(nginx_ingress_controller_requests{namespace="%{kube_namespace}",ingress=~".%{ci_environment_slug}.*"}[2m])) * 100 |
Permissions and Security
Documentation
Availability & Testing
What does success look like, and how can we measure that?
What is the type of buyer?
Is this a cross-stage feature?
Links / references
Edited by Orit Golowinski