Web hook not triggered for pending jobs

Release notes

If you have setup custom CI monitoring to track state of Jobs it has been hard to track how many pending jobs may exist within a project using the Job Event Webhook.

Now the webhook will fire an event when a job state changes to pending so you do not have to support workarounds or setup custom integrations to keep track of CI jobs.

Summary

Web hook for "Job Event" is not triggered when a job enters "pending" mode.

Steps to reproduce

Create new project
Create .gitlab-ci.yml containing:

test:
  stage: test
  script:
  - echo testing

Create Job Hook, enabling hooks for "Job Events"
Monitor events on the hook target. For this, I use a simple Python server:

import http.server
import json
import socketserver
import logging
socketserver.TCPServer.allow_reuse_address = True
logger = logging.getLogger()
logging.basicConfig(level=logging.DEBUG)
class WebhookHandler(http.server.SimpleHTTPRequestHandler):
    def log_message(self, format, *args):
        pass
    def do_POST(self):
        length = int(self.headers.get('content-length', 0))
        event = self.headers.get('x-gitlab-event')
        token = self.headers.get('x-gitlab-token')
        data = json.loads(self.rfile.read(length).decode())
        if event == 'Job Hook':
            logger.info('event for (%s for %d: %s)',
                        event,
                        data['build_id'],
                        data['build_status'])
        else:
            logger.info('event (%s)', event)
        self.send_response(200)
        self.end_headers()
w = socketserver.TCPServer(("", 9875), WebhookHandler)
try:
    w.serve_forever()
except:
    w.shutdown()

Observe events sent to target

Example Project

I have a test project that I am glad to provide access to. However, none of the web hooks appear to actually work. I can test them fine, but when a job is run, I get no posts on my target url.

I was able to successfully reproduce on a private repo of mine on gitlab.com.

What is the current bug behavior?

I get job events, but not for the "pending" state (in this example, a job failed):

INFO:root:event for (Job Hook for 197109542: created)
INFO:root:event for (Job Hook for 197109542: running)
INFO:root:event for (Job Hook for 197109542: failed)

What is the expected correct behavior?

I expect to see the job enter the pending state:

INFO:root:event for (Job Hook for 197109542: created)
INFO:root:event for (Job Hook for 197109542: pending)
INFO:root:event for (Job Hook for 197109542: running)
INFO:root:event for (Job Hook for 197109542: failed)

Relevant logs and/or screenshots

n/a

Output of checks

This bug happens on GitLab.com

Results of GitLab environment info

I am only a user of our local GitLab EE install, but have reproduced on gitlab.com.

Results of GitLab application Check

See above.

Possible fixes / Proposal

To trigger the webhook on pending jobs it should be sufficient to fire the worker on that transition:

diff --git a/app/models/ci/build.rb b/app/models/ci/build.rb
index 4328f3f7a4b..ddb017a2e6d 100644
--- a/app/models/ci/build.rb
+++ b/app/models/ci/build.rb
@@ -286,10 +286,11 @@ def with_preloads
       after_transition any => [:pending] do |build, transition|
         Ci::UpdateBuildQueueService.new.push(build, transition)
 
         build.run_after_commit do
           BuildQueueWorker.perform_async(id)
+          BuildHooksWorker.perform_async(id)
         end
       end
 
       after_transition pending: any do |build, transition|
         Ci::UpdateBuildQueueService.new.pop(build, transition)

Background

On my local GitLab EE install, I want to run the CI jobs on an LSF scheduler. To do this, I have created a script that listens to webhooks from my GitLab project. When a CI job is pending, I use the API to get info on the job and available runners, then submit an LSF job which runs the appropriate runner and exists after the runner completes one CI job.

I should be able to listen to the webhooks and only respond when a CI job is pending. However, the webhook is not called for the pending state, so instead I listen to any state change, then request all pending jobs from the API. This is not always reliable. For example, if a job is run manually, there is no webhook sent when the user manually asks the job to run. Thus, I must resort to periodically checking for any pending jobs. If I do this too frequently, it increases the cpu load of my daemon. If I do not do it frequently enough, there is a lag between when a user manually triggers a job, and when that job actually runs.

I suspect this would also be an issue for delayed CI jobs.

Edited Dec 08, 2021 by James Heimbuck