Job timeouts still not working
Problem
CI/CD job timeouts set by the user is propagated to the Runner, Runner cancels the job when the timeout threshold has been exceeded and then sends the job status to RAILS. In some scenarios it does not appear that the RAILS app is handling the updated job status. This is resulting in users noting that CI job timeouts not working.
Actual behavior
Have 5 jobs in an stage and runner concurrency of 4 and this is the only runner added to project repo.
All the 3 jobs that neither finished nor timed out have their timeouts set in ci. The script execution for all 3 jobs did start but then it hanged(!)
So then manually canceled the job from UI and retried and things went smoothly as expected.
Expected behavior
Jobs should not hang and should honor timeout!
Relevant screenshots
Environment description
Using shell executor (on AWS Lightsail with gitlab-runner latest stable) with gitlab.com.
config.toml contents
concurrent = 4
check_interval = 0
[session_server]
session_timeout = 7200
listen_address = "0.0.0.0:8093"
advertise_address = "server-random.example.com:8093"
[[runners]]
name = "runner for server-random.example.com"
url = "https://gitlab.com"
token = "2ZBY4-token-for-server-random.example.com-Dxz"
executor = "shell"
[runners.custom_build_dir]
[runners.cache]
[runners.cache.s3]
[runners.cache.gcs]
Used GitLab Runner version
Version: 13.4.1
Git revision: e95f89a0
Git branch: 13-4-stable
GO version: go1.13.8
Built: 2020-09-25T20:03:43+0000
OS/Arch: linux/amd64
Also have faced the same problem once with v13.1.1 and then twice with 13.4.1
Weird Behaviour
Faced this issue thrice, and all the 3 times it occured only with those 3 jobs. As per the job logs, all the 3 times the jobs hanged during different commands' execution.
Sorry can't share the CI code publicly but @steveazz I can provide you access to the repo (with runner still setup on server) as the 3rd time I faced this issue is a few hours ago.
So I am gonna keep the server running for a few days.
In all those 3 jobs that hanged, at some point 'yum' is called, and if there is a process of yum already running then the newer process logs
Existing lock /var/run/yum.pid: another copy is running as pid 26118.
Another app is currently holding the yum lock; waiting for it to exit...
The other application is: yum
Memory : 98 M RSS (376 MB VSZ)
Started: Fri Oct 2 15:39:47 2020 - 00:11 ago
State : Running, pid: 26118
and waits for the other process to finish...