Define request apdex counters instead

Review changes
Download
Patches
Plain diff

Bob Van Landuyt requested to merge bvl-apdex-sli-counters into master Aug 27, 2021

Overview 64
Commits 1
Pipelines 20
Changes 20

What does this MR do?

With this, we'll emit 2 new counters from web processes that can be used to monitor apdex.

The gitlab_sli:rails_request_apdex:total counter is incremented for every successful (not a 500) that is not to a health endpoint.

The gitlab_sli:rails_request_apdex:success_total is incremented when the request took less than 1 second. We intend to customize this value per endpoint in the future.

Both these counters are labelled with feature_category and endpoint_id from the context.

The metrics would also be initialized on the first scrape. This means that a 0 would be available for every set of labels, avoiding bugs in calculations with these metrics.

To get to all of the feature_categorys and endpoint_ids for the initialization, we had to move some code that iterates all endpoints that was only used in tests to the application.

We know this would initialize about 2 * 2500 metrics per pod running a web server. So we'd like to roll this out in a controlled fashion, to make sure this doesn't impact our monitoring. Which is why this is feature flagged.

This also limits the initialization of these metrics to just web-processes. So they don't get generated for consoles or runner processes.

This also includes a developer-api to define SLIs and encourages initializing them with the known label sets.

For gitlab-com/gl-infra/scalability#1099 (closed)

Screenshots or Screencasts (strongly suggested)

A local instance that did not receive any requests looks like this:

When I start hitting it:

As we can see, the metrics start at 0 before some of them receive traffic

Does this MR meet the acceptance criteria?

Conformity

I have included changelog trailers, or none are needed. (Does this MR need a changelog?)
[-] I have added/updated documentation, or it's not needed. (Is documentation required?)
I have properly separated EE content from FOSS, or this MR is FOSS only. (Where should EE code go?)
I have added information for database reviewers in the MR description, or it's not needed. (Does this MR have database related changes?)
I have self-reviewed this MR per code review guidelines.
This MR does not harm performance, or I have asked a reviewer to help assess the performance impact. (Merge request performance guidelines)
I have followed the style guides.
This change is backwards compatible across updates, or this does not apply.

Availability and Testing

I have added/updated tests following the Testing Guide, or it's not needed. (Consider all test levels. See the Test Planning Process.)
I have tested this MR in all supported browsers, or it's not needed.
I have informed the Infrastructure department of a default or new setting change per definition of done, or it's not needed.

Edited Sep 20, 2021 by Bob Van Landuyt

Merge request reports

Assignee

Reviewers

Request review from

Time tracking