CI-CD timeout not followed and got billed for a stuck runner
Summary
A pipeline failed after 710 minutes 7 seconds (11,83 hours!) with an error:
There has been a timeout failure or the job got stuck. Check your timeout limits or try again
The CI-CD configuration is however configured to timeout after 1 hour:
It appears that these minutes still got billed, even though:
- The CI-CD timeout was set to 1 hour
- Nothing was executed because the runner got stuck
That raises a few questions:
- Why didn't the pipeline stop after 1 hour, as configured?
- Why do these minutes count towards the usage quota when the runner is stuck and didn't execute anything?
Steps to reproduce
This is probably a server-side issue at the runner. Hopefully you can find anything in the logs for job id 1365539623
The runner needs to not send the timeout end message to the server. Stopping the process that sends the timeout update may allow this to be reproduced.
Example Project
See job 1365539623.
Output of checks
This bug happens on GitLab.com
Related issue
Related code in Runner Codebase
Job starting with jobResponse
Context creation with timeout
Build start with timeout-able context
Execution with propagation of the timeout context
When build not finish on time, cancel after wait has timed out
Job status sent back to GitLab
Possible solutions
Add a timeout on the server-side to catch cases where the runner never sends a completion message.


