WIP: Geo: Refactor scheduler workers to avoid long running jobs
What does this MR do?
I adapted Geo::Scheduler::SchedulerWorker
to use ContinuousWorker
which I created as part of another MR. This must be feature flagged if we want it for real.
I smoke tested updating a repo, pushing LFS, and adding an Upload. All were replicated, and all Admin Area > Geo > Nodes progress bars returned to 100%.
FileDownloadDispatchWorker
Example logs from I added comments between jobs:
{"severity":"INFO","time":"2019-12-18T22:19:15.934Z","correlation_id":"5e89e508-b22e-42a8-b3b9-8cca0d144419","class":"Geo::FileDownloadDispatchWorker","message":"Started scheduler","job_id":"8daa70a642e612313b1d24c1"}
{"severity":"INFO","time":"2019-12-18T22:19:16.070Z","correlation_id":"5e89e508-b22e-42a8-b3b9-8cca0d144419","class":"Geo::FileDownloadDispatchWorker","message":"#schedule_jobs finished","job_id":"8daa70a642e612313b1d24c1","enqueued":0,"pending":0,"scheduled":0,"capacity":10}
{"severity":"INFO","time":"2019-12-18T22:19:16.071Z","correlation_id":"5e89e508-b22e-42a8-b3b9-8cca0d144419","class":"Geo::FileDownloadDispatchWorker","message":"Quitting","job_id":"8daa70a642e612313b1d24c1","reason":"no_more_work"}
# Since the Worker "quit" due to "no_more_work", it doesn't run again until `sidekiq-cron` schedules it within the next 60 seconds.
{"severity":"INFO","time":"2019-12-18T22:20:23.299Z","correlation_id":"5b504c67-bd91-489c-81d5-efe2f1124f2b","class":"Geo::FileDownloadDispatchWorker","message":"Started scheduler","job_id":"b66884221a9f3d3777432f43"}
{"severity":"INFO","time":"2019-12-18T22:20:23.391Z","correlation_id":"5b504c67-bd91-489c-81d5-efe2f1124f2b","class":"Geo::FileDownloadDispatchWorker","message":"#schedule_jobs finished","job_id":"b66884221a9f3d3777432f43","enqueued":1,"pending":0,"scheduled":1,"capacity":10}
{"severity":"INFO","time":"2019-12-18T22:20:25.484Z","correlation_id":"5b504c67-bd91-489c-81d5-efe2f1124f2b","class":"Geo::FileDownloadService","object_type":"file","object_db_id":1,"message":"File download","mark_as_synced":true,"download_success":true,"bytes_downloaded":28184,"failed_before_transfer":false,"primary_missing_file":false,"download_time_s":0.648}
# Since work was in progress when we last ran, we did not "quit", instead, we reenqueued ourself immediately.
# Since we didn't override minimum_duration, the first job slept until the default 5 seconds (see the next job starts 5.184s after the first one started).
{"severity":"INFO","time":"2019-12-18T22:20:28.483Z","correlation_id":"5b504c67-bd91-489c-81d5-efe2f1124f2b","class":"Geo::FileDownloadDispatchWorker","message":"Started scheduler","job_id":"006abd9dd042435ca1ceeff7"}
{"severity":"INFO","time":"2019-12-18T22:20:28.546Z","correlation_id":"5b504c67-bd91-489c-81d5-efe2f1124f2b","class":"Geo::FileDownloadDispatchWorker","message":"#schedule_jobs finished","job_id":"006abd9dd042435ca1ceeff7","enqueued":0,"pending":0,"scheduled":0,"capacity":10}
{"severity":"INFO","time":"2019-12-18T22:20:28.547Z","correlation_id":"5b504c67-bd91-489c-81d5-efe2f1124f2b","class":"Geo::FileDownloadDispatchWorker","message":"Quitting","job_id":"006abd9dd042435ca1ceeff7","reason":"no_more_work"}
Resolves #43664 (closed)
Does this MR meet the acceptance criteria?
Conformity
-
Changelog entry -
Documentation (if required) -
Code review guidelines -
Merge request performance guidelines -
Style guides -
Database guides -
Separation of EE specific content
Availability and Testing
-
Review and add/update tests for this feature/bug. Consider all test levels. See the Test Planning Process. -
Tested in all supported browsers
Security
If this MR contains changes to processing or storing of credentials or tokens, authorization and authentication methods and other items described in the security review guidelines:
-
Label as security and @ mention @gitlab-com/gl-security/appsec
-
The MR includes necessary changes to maintain consistency between UI, API, email, or other methods -
Security reports checked/validated by a reviewer from the AppSec team
Edited by Michael Kozono