Don't add a waiter when scheduling too many jobs

When we schedule jobs with a waiter, we occupy the thread until the jobs complete or until a timeout is reached.

If the number of jobs being waited on is very high, it is unlikely they will complete within the timeout, so we should consider not waiting on them.

If we don't wait for them, this frees up the web-worker faster.

On top of that, if we don't add the waiter key, the jobs could be deduplicated when more of them are scheduled in separate request using the middleware #42 (closed) (production#1739 (closed)).

One way to do this, would be assuming we can process 10 jobs per second (we do about ~100 AuthorizedProjectsWorker jobs per second during a peak) #205 (comment 301962806)

If we schedule more than 10 * timeout jobs, we don't wait for them:

diff --git a/app/workers/concerns/waitable_worker.rb b/app/workers/concerns/waitable_worker.rb
index f995aced542..437d8323adc 100644
--- a/app/workers/concerns/waitable_worker.rb
+++ b/app/workers/concerns/waitable_worker.rb
@@ -8,6 +8,11 @@ module WaitableWorker
     def bulk_perform_and_wait(args_list, timeout: 10)
       # Short-circuit: it's more efficient to do small numbers of jobs inline
       return bulk_perform_inline(args_list) if args_list.size <= 3
+      # Don't wait if there's too many jobs to be waited for
+      # not including the waiter allows them to be deduplicated and it skips
+      # waiting for jobs that are not likely to finish within the timeout
+      # this assumes we can process 10 jobs per second
+      return bulk_perform_async(args_list) if args_list >= 10 * timeout
 
       waiter = Gitlab::JobWaiter.new(args_list.size, worker_label: self.to_s)

(I think code explains my intent better)

For the AuthorizedProjectsWorker (the only worker using this concern for now), this would mean waiting for at most 100 jobs. According to the numbers in we can do about 100 a second. So waiting 10s for 100 jobs is achievable.

Edited Nov 24, 2020 by Rachel Nienaber