Use surrounding context's feature category for unowned workers
Overview
When we're in Sidekiq, in most cases we want to use the feature category of the current worker class in logs and metrics. If a worker with the 'users' feature category is called by an endpoint that has the 'issue tracking' feature category, then 'users' is the more specific category.
However, for workers without an owning feature category, this is worse: we're effectively throwing away the information from the surrounding context to give the least-specific information we could possibly give. This changes that so that in logs and in metrics, we'll use the feature category from the calling context for workers that don't have an owner.
For gitlab-com/gl-infra/scalability#1200 (closed) and gitlab-com/gl-infra/scalability#1164 (closed).
Testing
We have two axes we want to test on:
- Output location: Prometheus metrics and logs should have the same information.
- Worker type: there are three types we're interested in.
- Workers with a feature category. These should use their own feature category in logs and metrics.
- Workers with
feature_category_not_owned!
. These should use the feature category from their caller. - Mailers, which aren't directly Sidekiq workers in our application code, should also use the feature category from their caller.
The actual testing is pretty straightforward, it's just we have a few cases to get through. For the Prometheus part the GDK might have some issues (gitlab-development-kit#1061), but the logs should be straightforward.
Workers without a feature category
Visit pipelines/new
in a project. This is a continuous_integration
endpoint that calls a not_owned
worker (the
ExternalServiceReactiveCachingWorker). We can see this work in the logs
(letting it run for a little while to include scheduled jobs as well as
immediate jobs):
$ gdk tail rails-background-jobs | grep ReactiveCachingWorker | sed 's/^.* : ....//' | jq '[."meta.feature_category", ."meta.caller_id"]'
[
"continuous_integration",
"Projects::PipelinesController#config_variables"
]
[
"continuous_integration",
"Projects::PipelinesController#config_variables"
]
[
"continuous_integration",
"ExternalServiceReactiveCachingWorker"
]
[
"continuous_integration",
"ExternalServiceReactiveCachingWorker"
]
[
"continuous_integration",
"ExternalServiceReactiveCachingWorker"
]
[
"continuous_integration",
"ExternalServiceReactiveCachingWorker"
]
And in metrics:
Note that there is a series with not_owned
at zero as that is the
default category for this worker.
Workers with a feature category
Create a new issue mentioning another user, then do the same on an MR. This tests both the new issue / MR workers (which have a feature category) and mailers (which don't; see below for results).
$ gdk tail rails-background-jobs | grep IssuesController | sed 's/^.* : ....//' | jq '[."class", ."meta.feature_category", ."meta.caller_id"]'
[
"NewIssueWorker",
"issue_tracking",
"Projects::IssuesController#create"
]
[
"IssuePlacementWorker",
"issue_tracking",
"Projects::IssuesController#create"
]
$ gdk tail rails-background-jobs | grep IssuesController | sed 's/^.* : ....//' | jq '[."class", ."meta.feature_category", ."meta.caller_id"]'
[
"NewMergeRequestWorker",
"code_review",
"Projects::MergeRequests::CreationsController#create"
]
[
"MergeRequests::SyncCodeOwnerApprovalRulesWorker",
"code_review",
"Projects::MergeRequests::CreationsController#create"
]
Metrics:
Mailers
$ gdk tail rails-background-jobs | grep MailDeliveryJob | sed 's/^.* : ....//' | jq '[."class", ."meta.feature_category", ."meta.caller_id"]'
[
"ActionMailer::MailDeliveryJob",
"issue_tracking",
"NewIssueWorker"
]
[
"ActionMailer::MailDeliveryJob",
"issue_tracking",
"NewIssueWorker"
]
[
"ActionMailer::MailDeliveryJob",
"code_review",
"NewMergeRequestWorker"
]
[
"ActionMailer::MailDeliveryJob",
"code_review",
"NewMergeRequestWorker"
]
And metrics:
Caveat
Here it's worth noting a limitation: many mailers go via
NotificationService::Async
,
which means we have controller -> NotificationService::Async
-> MailScheduler::NotificationServiceWorker
-> ActionMailer::MailDeliveryJob
.
Because MailScheduler::NotificationServiceWorker
has the category
issue_tracking
, all emails scheduled in that way are marked as
issue_tracking
. That is why there are so many more issue_tracking
mails than
code_review
mails in the metrics screenshot.
I considered changing that in this MR but:
- The scope is already too big.
- I noticed that this gets called by GraphQL a lot, and GraphQL is also not owned
🙃
So I figure we can leave that for now.