Skip to content

Use surrounding context's feature category for unowned workers

Overview

When we're in Sidekiq, in most cases we want to use the feature category of the current worker class in logs and metrics. If a worker with the 'users' feature category is called by an endpoint that has the 'issue tracking' feature category, then 'users' is the more specific category.

However, for workers without an owning feature category, this is worse: we're effectively throwing away the information from the surrounding context to give the least-specific information we could possibly give. This changes that so that in logs and in metrics, we'll use the feature category from the calling context for workers that don't have an owner.

For gitlab-com/gl-infra/scalability#1200 (closed) and gitlab-com/gl-infra/scalability#1164 (closed).

Testing

We have two axes we want to test on:

  1. Output location: Prometheus metrics and logs should have the same information.
  2. Worker type: there are three types we're interested in.
    1. Workers with a feature category. These should use their own feature category in logs and metrics.
    2. Workers with feature_category_not_owned!. These should use the feature category from their caller.
    3. Mailers, which aren't directly Sidekiq workers in our application code, should also use the feature category from their caller.

The actual testing is pretty straightforward, it's just we have a few cases to get through. For the Prometheus part the GDK might have some issues (gitlab-development-kit#1061), but the logs should be straightforward.

Workers without a feature category

Visit pipelines/new in a project. This is a continuous_integration endpoint that calls a not_owned worker (the ExternalServiceReactiveCachingWorker). We can see this work in the logs (letting it run for a little while to include scheduled jobs as well as immediate jobs):

$ gdk tail rails-background-jobs | grep ReactiveCachingWorker | sed 's/^.* : ....//' | jq '[."meta.feature_category", ."meta.caller_id"]'
[
  "continuous_integration",
  "Projects::PipelinesController#config_variables"
]
[
  "continuous_integration",
  "Projects::PipelinesController#config_variables"
]
[
  "continuous_integration",
  "ExternalServiceReactiveCachingWorker"
]
[
  "continuous_integration",
  "ExternalServiceReactiveCachingWorker"
]
[
  "continuous_integration",
  "ExternalServiceReactiveCachingWorker"
]
[
  "continuous_integration",
  "ExternalServiceReactiveCachingWorker"
]

And in metrics:

sidekiq-1

Note that there is a series with not_owned at zero as that is the default category for this worker.

Workers with a feature category

Create a new issue mentioning another user, then do the same on an MR. This tests both the new issue / MR workers (which have a feature category) and mailers (which don't; see below for results).

$ gdk tail rails-background-jobs | grep IssuesController | sed 's/^.* : ....//' | jq '[."class", ."meta.feature_category", ."meta.caller_id"]'
[
  "NewIssueWorker",
  "issue_tracking",
  "Projects::IssuesController#create"
]
[
  "IssuePlacementWorker",
  "issue_tracking",
  "Projects::IssuesController#create"
]
$ gdk tail rails-background-jobs | grep IssuesController | sed 's/^.* : ....//' | jq '[."class", ."meta.feature_category", ."meta.caller_id"]'
[
  "NewMergeRequestWorker",
  "code_review",
  "Projects::MergeRequests::CreationsController#create"
]
[
  "MergeRequests::SyncCodeOwnerApprovalRulesWorker",
  "code_review",
  "Projects::MergeRequests::CreationsController#create"
]

Metrics:

sidekiq-2

Mailers

$ gdk tail rails-background-jobs | grep MailDeliveryJob | sed 's/^.* : ....//' | jq '[."class", ."meta.feature_category", ."meta.caller_id"]'
[
  "ActionMailer::MailDeliveryJob",
  "issue_tracking",
  "NewIssueWorker"
]
[
  "ActionMailer::MailDeliveryJob",
  "issue_tracking",
  "NewIssueWorker"
]
[
  "ActionMailer::MailDeliveryJob",
  "code_review",
  "NewMergeRequestWorker"
]
[
  "ActionMailer::MailDeliveryJob",
  "code_review",
  "NewMergeRequestWorker"
]

And metrics:

sidekiq-3

Caveat

Here it's worth noting a limitation: many mailers go via NotificationService::Async, which means we have controller -> NotificationService::Async -> MailScheduler::NotificationServiceWorker -> ActionMailer::MailDeliveryJob.

Because MailScheduler::NotificationServiceWorker has the category issue_tracking, all emails scheduled in that way are marked as issue_tracking. That is why there are so many more issue_tracking mails than code_review mails in the metrics screenshot.

I considered changing that in this MR but:

  1. The scope is already too big.
  2. I noticed that this gets called by GraphQL a lot, and GraphQL is also not owned 🙃

So I figure we can leave that for now.

Edited by Sean McGivern

Merge request reports