Define request apdex counters instead
What does this MR do?
With this, we'll emit 2 new counters from web processes that can be used to monitor apdex.
The gitlab_sli:rails_request_apdex:total
counter is incremented for
every successful (not a 500) that is not to a health endpoint.
The gitlab_sli:rails_request_apdex:success_total
is incremented when
the request took less than 1 second. We intend to customize this value
per endpoint in the future.
Both these counters are labelled with feature_category
and
endpoint_id
from the context.
The metrics would also be initialized on the first scrape. This means that a 0 would be available for every set of labels, avoiding bugs in calculations with these metrics.
To get to all of the feature_category
s and endpoint_id
s for the
initialization, we had to move some code that iterates all endpoints
that was only used in tests to the application.
We know this would initialize about 2 * 2500 metrics per pod running a web server. So we'd like to roll this out in a controlled fashion, to make sure this doesn't impact our monitoring. Which is why this is feature flagged.
This also limits the initialization of these metrics to just web-processes. So they don't get generated for consoles or runner processes.
This also includes a developer-api to define SLIs and encourages initializing them with the known label sets.
For gitlab-com/gl-infra/scalability#1099 (closed)
Screenshots or Screencasts (strongly suggested)
A local instance that did not receive any requests looks like this:
When I start hitting it:
As we can see, the metrics start at 0 before some of them receive traffic
Does this MR meet the acceptance criteria?
Conformity
-
I have included changelog trailers, or none are needed. (Does this MR need a changelog?) - [-] I have added/updated documentation, or it's not needed. (Is documentation required?)
-
I have properly separated EE content from FOSS, or this MR is FOSS only. (Where should EE code go?) -
I have added information for database reviewers in the MR description, or it's not needed. (Does this MR have database related changes?) -
I have self-reviewed this MR per code review guidelines. -
This MR does not harm performance, or I have asked a reviewer to help assess the performance impact. (Merge request performance guidelines) -
I have followed the style guides. -
This change is backwards compatible across updates, or this does not apply.
Availability and Testing
-
I have added/updated tests following the Testing Guide, or it's not needed. (Consider all test levels. See the Test Planning Process.) -
I have tested this MR in all supported browsers, or it's not needed. -
I have informed the Infrastructure department of a default or new setting change per definition of done, or it's not needed.