[ci] job timeout should only take `script` section into account.
Summary
Each job has a timeout (either per runner or per repository), supposedly to keep stuck and out-of-bound jobs under control.
Now there are a few time-consuming stages, that can be influenced by the user, namely:
before_scriptscriptafter_script
However, there are some more steps involved that add up to the total time, e.g.:
- docker image fetch
-
pre_clone_scriptof the runner.
It would be great, if the timeout (at least the per-project timeout) would only take the user-accountable stages into account.
Steps to reproduce
- setup CI-runner
myrunner- configure the runner with a
pre_clone_script: sleep 1000
- configure the runner with a
- configure project
FOOto use CI/CD- configure the CI-timeout to be
10 minutes - configure a job to be run on
myrunner
- configure the CI-timeout to be
- trigger a pipeline for project
FOOto be executed on runnermyrunner
What is the current bug behavior?
Notice how the job will always fail due to timeouts, regardless of the actual time spent in the (user-controlled) .gitlab-ci.yml
What is the expected correct behavior?
I would have expected the timeout to only take the actual user-defined parts into account (script, before_script, post_script,...).
So even if the pre_clone_script takes long, or fetching a (largish) docker-image takes long, this doesn't take away time from the actual build process.
Possible fixes
Have the per-project timeout only take those values into account that can be influenced by the user.
In order to catch stalled pre_clone_script runs or similar, there might be an additional (per-runner) timeout that only applies to the steps outside ther user's control.
/label ~"CI/CD"