Web hook not triggered for pending jobs
Release notes
If you have setup custom CI monitoring to track state of Jobs it has been hard to track how many pending
jobs may exist within a project using the Job Event Webhook.
Now the webhook will fire an event when a job state changes to pending
so you do not have to support workarounds or setup custom integrations to keep track of CI jobs.
Summary
Web hook for "Job Event" is not triggered when a job enters "pending" mode.
Steps to reproduce
- Create new project
- Create
.gitlab-ci.yml
containing:
test:
stage: test
script:
- echo testing
- Create Job Hook, enabling hooks for "Job Events"
- Monitor events on the hook target. For this, I use a simple Python server:
import http.server
import json
import socketserver
import logging
socketserver.TCPServer.allow_reuse_address = True
logger = logging.getLogger()
logging.basicConfig(level=logging.DEBUG)
class WebhookHandler(http.server.SimpleHTTPRequestHandler):
def log_message(self, format, *args):
pass
def do_POST(self):
length = int(self.headers.get('content-length', 0))
event = self.headers.get('x-gitlab-event')
token = self.headers.get('x-gitlab-token')
data = json.loads(self.rfile.read(length).decode())
if event == 'Job Hook':
logger.info('event for (%s for %d: %s)',
event,
data['build_id'],
data['build_status'])
else:
logger.info('event (%s)', event)
self.send_response(200)
self.end_headers()
w = socketserver.TCPServer(("", 9875), WebhookHandler)
try:
w.serve_forever()
except:
w.shutdown()
- Observe events sent to target
Example Project
I have a test project that I am glad to provide access to. However, none of the web hooks appear to actually work. I can test them fine, but when a job is run, I get no posts on my target url.
I was able to successfully reproduce on a private repo of mine on gitlab.com.
What is the current bug behavior?
I get job events, but not for the "pending" state (in this example, a job failed):
INFO:root:event for (Job Hook for 197109542: created)
INFO:root:event for (Job Hook for 197109542: running)
INFO:root:event for (Job Hook for 197109542: failed)
What is the expected correct behavior?
I expect to see the job enter the pending state:
INFO:root:event for (Job Hook for 197109542: created)
INFO:root:event for (Job Hook for 197109542: pending)
INFO:root:event for (Job Hook for 197109542: running)
INFO:root:event for (Job Hook for 197109542: failed)
Relevant logs and/or screenshots
n/a
Output of checks
This bug happens on GitLab.com
Results of GitLab environment info
I am only a user of our local GitLab EE install, but have reproduced on gitlab.com.
Results of GitLab application Check
See above.
Possible fixes / Proposal
To trigger the webhook on pending jobs it should be sufficient to fire the worker on that transition:
diff --git a/app/models/ci/build.rb b/app/models/ci/build.rb
index 4328f3f7a4b..ddb017a2e6d 100644
--- a/app/models/ci/build.rb
+++ b/app/models/ci/build.rb
@@ -286,10 +286,11 @@ def with_preloads
after_transition any => [:pending] do |build, transition|
Ci::UpdateBuildQueueService.new.push(build, transition)
build.run_after_commit do
BuildQueueWorker.perform_async(id)
+ BuildHooksWorker.perform_async(id)
end
end
after_transition pending: any do |build, transition|
Ci::UpdateBuildQueueService.new.pop(build, transition)
Background
On my local GitLab EE install, I want to run the CI jobs on an LSF scheduler. To do this, I have created a script that listens to webhooks from my GitLab project. When a CI job is pending, I use the API to get info on the job and available runners, then submit an LSF job which runs the appropriate runner and exists after the runner completes one CI job.
I should be able to listen to the webhooks and only respond when a CI job is pending. However, the webhook is not called for the pending
state, so instead I listen to any state change, then request all pending jobs from the API. This is not always reliable. For example, if a job is run manually, there is no webhook sent when the user manually asks the job to run. Thus, I must resort to periodically checking for any pending jobs. If I do this too frequently, it increases the cpu load of my daemon. If I do not do it frequently enough, there is a lag between when a user manually triggers a job, and when that job actually runs.
I suspect this would also be an issue for delayed CI jobs.