Skip to content

Improve latency metrics

Bob Van Landuyt requested to merge bvl-improve-apdex-metrics into main

Improve latency metrics

This reduces the number of buckets a bit, and adds a higher one that I think could be usefull

It also adds the status, method and handler labels to http_requests_duration_seconds_(bucket|sum|count) which will allow us to exclude 4xx & 5xx from the apdex. I think eventually, we'll want to switch this to success/total counters. But we can start with this.

For https://gitlab.com/gitlab-com/gl-infra/reliability/-/issues/23725

Resulting metrics:

http_request_duration_seconds_bucket{handler="/monitoring/healthz",le="0.5",method="GET",status="2xx"} 2.0
http_request_duration_seconds_bucket{handler="/monitoring/healthz",le="1.0",method="GET",status="2xx"} 2.0
http_request_duration_seconds_bucket{handler="/monitoring/healthz",le="10.0",method="GET",status="2xx"} 2.0
http_request_duration_seconds_bucket{handler="/monitoring/healthz",le="30.0",method="GET",status="2xx"} 2.0
http_request_duration_seconds_bucket{handler="/monitoring/healthz",le="60.0",method="GET",status="2xx"} 2.0
http_request_duration_seconds_bucket{handler="/monitoring/healthz",le="90.0",method="GET",status="2xx"} 2.0
http_request_duration_seconds_bucket{handler="/monitoring/healthz",le="+Inf",method="GET",status="2xx"} 2.0
http_request_duration_seconds_count{handler="/monitoring/healthz",method="GET",status="2xx"} 2.0

(I had to remove this endpoint from the denylist for testing because I the root path wasn't recognized. Asked about this in Slack)

Edited by Bob Van Landuyt

Merge request reports