CI-CD timeout not followed and got billed for a stuck runner

Summary

A pipeline failed after 710 minutes 7 seconds (11,83 hours!) with an error:

There has been a timeout failure or the job got stuck. Check your timeout limits or try again

image image

The CI-CD configuration is however configured to timeout after 1 hour:

image

It appears that these minutes still got billed, even though:

  • The CI-CD timeout was set to 1 hour
  • Nothing was executed because the runner got stuck

That raises a few questions:

  • Why didn't the pipeline stop after 1 hour, as configured?
  • Why do these minutes count towards the usage quota when the runner is stuck and didn't execute anything?

Steps to reproduce

This is probably a server-side issue at the runner. Hopefully you can find anything in the logs for job id 1365539623

The runner needs to not send the timeout end message to the server. Stopping the process that sends the timeout update may allow this to be reproduced.

Example Project

See job 1365539623.

Output of checks

This bug happens on GitLab.com
Related issue

Related code in Runner Codebase

Job starting with jobResponse
Context creation with timeout
Build start with timeout-able context
Execution with propagation of the timeout context
When build not finish on time, cancel after wait has timed out
Job status sent back to GitLab

Possible solutions

Add a timeout on the server-side to catch cases where the runner never sends a completion message.

Edited Dec 13, 2022 by Kasia Misirli
Assignee Loading
Time tracking Loading