Timeouts not working
Support has received couple of tickets where user has mentioned that the pipelines getting stuck and consuming all the minutes. The important part of the problem is users have defined the timeout but GitLab didn't enforce it.
Open Issues: gitlab-com/support-forum#3402
There can be many reasons for a pipeline getting stuck but IMO it should always enforce the timeout.
Things to investigate
- As suggested by @tmaczukin in #4147 (comment 186416120): In the past some of the jobs, for which we've seen the duration counted in days, there were trace patch/job update requests logged with 429 HTTP response code (which basically means that GitLab can't handle the amount of requests). We have a backoff mechanism added to the requesting client. Maybe this is somehow locking the jobs? Like: Runner tries to send the last update request after job was canceled/timeouted, but it receives 429. So it delays for a moment and repeats, again with 429. And the situation repeats (and Runner makes the delays longer and longer) until it stop sending the requests at all.