Shell runner freezes on long running job
Hi there, we are using GitLab CI/CD for quite some time now and our servers are updates each sunday (rolling update). On last sunday, our servers (GitLab and GitLab-Runner) were updated to latest version (11.11.1 (5a147c92)).
We have a deploy pipeline which is doing stuff over scp/ssh and is taking quite some time because of how our deployment is structured (Pipeline is doing backup, extracting artifact, ...) and slow hardware. But it worked fine until the very last update, it just took some time.
Since then, the job (a shell runner) is "freezing" at some point. The live output freezes randomly after some time (both the backup and extracting files from the artifact is listing all files) and also the raw version of the log does not update anymore. This happens when backing up files on the target server or extracting files from the artifact on the target server.
The scripts/commands of the deployment step seem to continue on the target server. Unfortunately we have no output beyond the freeze point, so we don't know when the deployment step is done or if it raised an error. The raw log at this point has a size of about 2.5 - 3mb.
Another problem with this: the "freezing" job keeps the shell runner locked. So new pipelines do not start anymore, even cancelling the job does not help with this issue.
Nothing of the following steps did solve the issue(s):
- cancelling the job (for the locked shell runner issue)
- rerunning the pipeline
- clearing the runner cache
- restarting the server/runner (this releases the shell runner lock though)
I can not really offer steps to reproduce at the moment, just want to ask if others have the same problem or if there was anything (even remotely) related to this in the latest updates of the runner. If I can give more informations (logs, ...) feel free to ask.