Skip to content

WIP: Geo: Refactor scheduler workers to avoid long running jobs

Michael Kozono requested to merge mk/loopless-scheduler-worker into master

What does this MR do?

I adapted Geo::Scheduler::SchedulerWorker to use ContinuousWorker which I created as part of another MR. This must be feature flagged if we want it for real.

I smoke tested updating a repo, pushing LFS, and adding an Upload. All were replicated, and all Admin Area > Geo > Nodes progress bars returned to 100%.

Example logs from FileDownloadDispatchWorker

I added comments between jobs:

{"severity":"INFO","time":"2019-12-18T22:19:15.934Z","correlation_id":"5e89e508-b22e-42a8-b3b9-8cca0d144419","class":"Geo::FileDownloadDispatchWorker","message":"Started scheduler","job_id":"8daa70a642e612313b1d24c1"}
{"severity":"INFO","time":"2019-12-18T22:19:16.070Z","correlation_id":"5e89e508-b22e-42a8-b3b9-8cca0d144419","class":"Geo::FileDownloadDispatchWorker","message":"#schedule_jobs finished","job_id":"8daa70a642e612313b1d24c1","enqueued":0,"pending":0,"scheduled":0,"capacity":10}
{"severity":"INFO","time":"2019-12-18T22:19:16.071Z","correlation_id":"5e89e508-b22e-42a8-b3b9-8cca0d144419","class":"Geo::FileDownloadDispatchWorker","message":"Quitting","job_id":"8daa70a642e612313b1d24c1","reason":"no_more_work"}

# Since the Worker "quit" due to "no_more_work", it doesn't run again until `sidekiq-cron` schedules it within the next 60 seconds.

{"severity":"INFO","time":"2019-12-18T22:20:23.299Z","correlation_id":"5b504c67-bd91-489c-81d5-efe2f1124f2b","class":"Geo::FileDownloadDispatchWorker","message":"Started scheduler","job_id":"b66884221a9f3d3777432f43"}
{"severity":"INFO","time":"2019-12-18T22:20:23.391Z","correlation_id":"5b504c67-bd91-489c-81d5-efe2f1124f2b","class":"Geo::FileDownloadDispatchWorker","message":"#schedule_jobs finished","job_id":"b66884221a9f3d3777432f43","enqueued":1,"pending":0,"scheduled":1,"capacity":10}
{"severity":"INFO","time":"2019-12-18T22:20:25.484Z","correlation_id":"5b504c67-bd91-489c-81d5-efe2f1124f2b","class":"Geo::FileDownloadService","object_type":"file","object_db_id":1,"message":"File download","mark_as_synced":true,"download_success":true,"bytes_downloaded":28184,"failed_before_transfer":false,"primary_missing_file":false,"download_time_s":0.648}

# Since work was in progress when we last ran, we did not "quit", instead, we reenqueued ourself immediately.
# Since we didn't override minimum_duration, the first job slept until the default 5 seconds (see the next job starts 5.184s after the first one started).

{"severity":"INFO","time":"2019-12-18T22:20:28.483Z","correlation_id":"5b504c67-bd91-489c-81d5-efe2f1124f2b","class":"Geo::FileDownloadDispatchWorker","message":"Started scheduler","job_id":"006abd9dd042435ca1ceeff7"}
{"severity":"INFO","time":"2019-12-18T22:20:28.546Z","correlation_id":"5b504c67-bd91-489c-81d5-efe2f1124f2b","class":"Geo::FileDownloadDispatchWorker","message":"#schedule_jobs finished","job_id":"006abd9dd042435ca1ceeff7","enqueued":0,"pending":0,"scheduled":0,"capacity":10}
{"severity":"INFO","time":"2019-12-18T22:20:28.547Z","correlation_id":"5b504c67-bd91-489c-81d5-efe2f1124f2b","class":"Geo::FileDownloadDispatchWorker","message":"Quitting","job_id":"006abd9dd042435ca1ceeff7","reason":"no_more_work"}

Resolves #43664 (closed)

Does this MR meet the acceptance criteria?

Conformity

Availability and Testing

Security

If this MR contains changes to processing or storing of credentials or tokens, authorization and authentication methods and other items described in the security review guidelines:

  • Label as security and @ mention @gitlab-com/gl-security/appsec
  • The MR includes necessary changes to maintain consistency between UI, API, email, or other methods
  • Security reports checked/validated by a reviewer from the AppSec team
Edited by Michael Kozono

Merge request reports