Drop duplicate jobs from Sidekiq when enqueuing
What does this MR do?
This extends the UntilExecuting
deduplication strategy to cancel
scheduling jobs when they are already in the queue.
When we drop a job, we log that to the Sidekiq.logger
The log messages looks like this (for now):
I, [2020-03-05T11:50:24.765516 #67227] INFO -- : {"class"=>"ProjectImportScheduleWorker", "retry"=>false, "queue"=>"project_import_schedule", :backtrace=>true, "jid"=>"474ddc7fd2ebecd467ef534e", "created_at"=>1583405424.761091, "enqueued_at"=>1583405424.764548, "meta.project"=>"gitlab-org/gitlab-shell", "meta.root_namespace"=>"gitlab-org", "meta.subscription_plan"=>"default", "correlation_id"=>"16e24e1b1696c7392de6a215d55d61bd", "duplicate-of"=>"ce57381d57ca6c4f671b3fab", "pid"=>67227, "job_status"=>"deduplicated", "message"=>"ProjectImportScheduleWorker JID-474ddc7fd2ebecd467ef534e: deduplicated: dropped until executing", "deduplication_type"=>"dropped until executing"}
I, [2020-03-05T11:50:24.768354 #67227] INFO -- : {"class"=>"ProjectImportScheduleWorker", "retry"=>false, "queue"=>"project_import_schedule", :backtrace=>true, "jid"=>"0f744ea81107a95f8a648ac5", "created_at"=>1583405424.761091, "enqueued_at"=>1583405424.765619, "meta.project"=>"gnuwget/wget2", "meta.root_namespace"=>"gnuwget", "meta.subscription_plan"=>"default", "correlation_id"=>"3e9a45d693f927b5c119d25cdb8ed33d", "duplicate-of"=>"be79a0f45e3b145a56fccd86", "pid"=>67227, "job_status"=>"deduplicated", "message"=>"ProjectImportScheduleWorker JID-0f744ea81107a95f8a648ac5: deduplicated: dropped until executing", "deduplication_type"=>"dropped until executing"}
It is part of the work for gitlab-com/gl-infra/scalability#42.
We can't enable the feature yet until we've converted the sidekiq logger to a structured logger, and ingest it in the sidekiq index:
Does this MR meet the acceptance criteria?
Conformity
- [-] Changelog entry
- [-] Documentation (if required)
-
Code review guidelines -
Merge request performance guidelines -
Style guides - [-] Database guides
-
Separation of EE specific content
Availability and Testing
-
Review and add/update tests for this feature/bug. Consider all test levels. See the Test Planning Process. - [-] Tested in all supported browsers
- [-] Informed Infrastructure department of a default or new setting change, if applicable per definition of done
Merge request reports
Activity
Hi @reprazent,
Please add labels to your merge request, this helps triage community merge requests.
Thanks for your help!
You are welcome to help improve this comment.
added auto updated label
added 1 commit
- 8c3d0796 - Drop duplicate jobs from Sidekiq when queuing
added 426 commits
-
8c3d0796...f96750dd - 425 commits from branch
gitlab-org:master
- c57285ea - Drop duplicate jobs from Sidekiq when enqueuing
-
8c3d0796...f96750dd - 425 commits from branch
changed milestone to %12.9
added Background Processing backstage [DEPRECATED] groupscalability labels
added 19 commits
-
c57285ea...91d59c17 - 18 commits from branch
gitlab-org:master
- 161ef851 - Drop duplicate jobs from Sidekiq when enqueuing
-
c57285ea...91d59c17 - 18 commits from branch
@oswaldo as you've reviewed the first iteration of this, would you mind reviewing the follow up?
assigned to @oswaldo
- Resolved by Oswaldo Ferreira
Reviewer roulette
Changes that require review have been detected! A merge request is normally reviewed by both a reviewer and a maintainer in its primary category (e.g. frontend or backend), and by a maintainer in all other categories.
To spread load more evenly across eligible reviewers, Danger has randomly picked a candidate for each review slot. Feel free to override this selection if you think someone else would be better-suited, or the chosen person is unavailable.
To read more on how to use the reviewer roulette, please take a look at the Engineering workflow and code review guidelines.
Once you've decided who will review this merge request, mention them as you normally would! Danger does not (yet?) automatically notify them for you.
Category Reviewer Maintainer backend Arturo Herrero ( @arturoherrero
)James Lopez ( @jameslopez
)Generated by
DangerEdited by BobBot BobBotremoved auto updated label
mentioned in merge request !26586 (merged)
- Resolved by Oswaldo Ferreira
@reprazent Would you mind refreshing my mind on why we're storing the
jid
now at https://gitlab.com/gitlab-org/gitlab/blob/1f085d7e5a32491c4f97abcc8fd092b9fdeafb41/lib/gitlab/sidekiq_middleware/duplicate_jobs/duplicate_job.rb#L49?I'm not 100% sure if I follow why there only would be a duplicate if the stored
jid
is the same as the new jobjid
being scheduled - Am I missing something on how thejid
is generated so that a duplicate job is caught by this?
- Resolved by Oswaldo Ferreira
Thanks @reprazent, mostly looking good! Maybe we can clarify a bit further the logging strategy (left a question).
unassigned @oswaldo
added 157 commits
-
161ef851...bae06faf - 156 commits from branch
gitlab-org:master
- bea81492 - Drop duplicate jobs from Sidekiq when enqueuing
-
161ef851...bae06faf - 156 commits from branch
Thanks @oswaldo, updated and replied. Another pass?
assigned to @oswaldo
- Resolved by Bob Van Landuyt
Thanks @reprazent, the logging makes sense with !26586 (merged) explanations. LGTM
unassigned @oswaldo
Thanks @oswaldo! I've updated the method name.
@jameslopez Would you have a moment for the maintainer review here?
assigned to @jameslopez
added 1 commit
- 6a00a983 - Drop duplicate jobs from Sidekiq when enqueuing
added 241 commits
-
6a00a983...2665e40c - 240 commits from branch
gitlab-org:master
- a6a7ee75 - Drop duplicate jobs from Sidekiq when enqueuing
-
6a00a983...2665e40c - 240 commits from branch
- Resolved by James Lopez
@reprazent thanks! LGTM
enabled an automatic merge when the pipeline for a6a7ee75 succeeds
mentioned in commit aee026f0
mentioned in issue gitlab-com/gl-infra/scalability#202
added workflowstaging label
added workflowcanary label and removed workflowstaging label
added workflowproduction label and removed workflowcanary label
mentioned in merge request gitlab-cookbooks/gitlab_fluentd!140 (closed)