Upon Cancelling a Running Job, server running gitlab-runner gets "ERROR: Checking for jobs... forbidden"
I have successfully registered runners and jobs are being built and everything is fine and dandy.
But, then... if a build gets cancelled, it stop accepting jobs.
When trying to debug I am finding following: gitlab-runner --debug verify
sudo gitlab-runner --debug verify Runtime platform arch=386 os=linux revision=5396d320 version=11.0.0 Checking runtime mode GOOS=linux uid=0 Running in system-mode. Trying to load /etc/gitlab-runner/certs/gitlab.company.com.crt ... Dialing: tcp gitlab.cumul8.com:443 ... ERROR: Verifying runner... is removed runner=907ca4f2 FATAL: Failed to verify runners
gitlab-runner --debug run
sudo gitlab-runner --debug run Runtime platform arch=386 os=linux revision=5396d320 version=11.0.0 Starting multi-runner from /etc/gitlab-runner/config.toml ... builds=0 Checking runtime mode GOOS=linux uid=0 Running in system-mode. Configuration loaded builds=0 metricsserveraddress: "" listenaddress: "" concurrent: 4 checkinterval: 0 loglevel: null user: "" runners: - name: cirunner03a-ozone limit: 1 outputlimit: 20000 requestconcurrency: 0 runnercredentials: url: https://gitlab.company.com/ token: 907ca4f265a53497887506fd24ad08 tlscafile: "" tlscertfile: "" tlskeyfile: "" runnersettings: executor: shell buildsdir: /home/gitlab-runner/builds/runner_a/builds_dir cachedir: /home/gitlab-runner/builds/runner_a/cache_dir cloneurl: "" environment:  preclonescript: "" prebuildscript: "" postbuildscript: "" shell: "" ssh: null docker: null parallels: null virtualbox: null cache: type: "" serveraddress: "" accesskey: "" secretkey: "" bucketname: "" bucketlocation: "" insecure: false path: "" shared: false machine: null kubernetes: null sentrydsn: null modtime: 2018-05-09T14:54:51.489977488-07:00 loaded: true builds=0 Waiting for stop signal builds=0 WARNING: 'metrics_server' configuration entry is deprecated and will be removed in one of future releases; please use 'listen_address' instead Metrics server disabled Feeding runners to channel builds=0 Starting worker builds=0 worker=0 Starting worker builds=0 worker=1 Starting worker builds=0 worker=2 Starting worker builds=0 worker=3 Trying to load /etc/gitlab-runner/certs/gitlab.company.com.crt ... Dialing: tcp gitlab.cumul8.com:443 ... ERROR: Checking for jobs... forbidden runner=907ca4f2 Feeding runners to channel builds=0 ERROR: Checking for jobs... forbidden runner=907ca4f2 Feeding runners to channel builds=0 ERROR: Checking for jobs... forbidden runner=907ca4f2 ERROR: Runner https://gitlab.cumul8.com/907ca4f265a53497887506fd24ad08 is not healthy and will be disabled! Feeding runners to channel builds=0 Feeding runners to channel builds=0
sudo gitlab-runner --version Version: 11.0.0 Git revision: 5396d320 Git branch: 11-0-stable GO version: go1.8.7 Built: 2018-06-22T11:03:37+00:00 OS/Arch: linux/386
I have tried stopping, starting and restarting the Gitlab-runner Also restarting the VM/Server Dosen't work.
Sometimes updating the version works.
Steps to reproduce
- Start a job
- Cancel the job
- Wait for it to start next job
- Nothing happens...
- Wait for about 1h and it starts accepting jobs
gitlab-runner verify --delete gitlab-runner run
Seems to make is start again
What is the current bug behavior?
Cancelling a job stops the runner from picking up new jobs
What is the expected correct behavior?
When cancelling a job, it should just pick up next available job
Relevant logs and/or screenshots
(Paste any relevant logs - please use code blocks (```) to format console output, logs, and code as it's very hard to read otherwise.)
Output of checks
This is happening on our private gitlab
Results of GitLab environment info
GitLab 10.8.0 (gitlab-ce@55e4a0b3) GitLab Shell 7.1.2 GitLab Workhorse v4.2.0 GitLab API
(If you can, link to the line of code that might be responsible for the problem)