Improve latency metrics
Improve latency metrics
This reduces the number of buckets a bit, and adds a higher one that I think could be usefull
It also adds the status, method and handler labels to
http_requests_duration_seconds_(bucket|sum|count)
which will allow
us to exclude 4xx & 5xx from the apdex. I think eventually, we'll want to switch this to success/total counters. But we can start with this.
For https://gitlab.com/gitlab-com/gl-infra/reliability/-/issues/23725
Resulting metrics:
http_request_duration_seconds_bucket{handler="/monitoring/healthz",le="0.5",method="GET",status="2xx"} 2.0
http_request_duration_seconds_bucket{handler="/monitoring/healthz",le="1.0",method="GET",status="2xx"} 2.0
http_request_duration_seconds_bucket{handler="/monitoring/healthz",le="10.0",method="GET",status="2xx"} 2.0
http_request_duration_seconds_bucket{handler="/monitoring/healthz",le="30.0",method="GET",status="2xx"} 2.0
http_request_duration_seconds_bucket{handler="/monitoring/healthz",le="60.0",method="GET",status="2xx"} 2.0
http_request_duration_seconds_bucket{handler="/monitoring/healthz",le="90.0",method="GET",status="2xx"} 2.0
http_request_duration_seconds_bucket{handler="/monitoring/healthz",le="+Inf",method="GET",status="2xx"} 2.0
http_request_duration_seconds_count{handler="/monitoring/healthz",method="GET",status="2xx"} 2.0
(I had to remove this endpoint from the denylist for testing because I the root path wasn't recognized. Asked about this in Slack)
Edited by Bob Van Landuyt