Alert when jobs from deploys take too long
Currently we don't have a need to look at deploys for staging and canary, as those are nearly 100% automated, and we'll get alerted when the pipeline fails. In production, some Release Managers may leave the pipeline window open and off to the side. But what if jobs are stuck? Could we enable a more hands-off approach to deploys? One of the concerns is the speed at which a job completes. Today was a great example of one job taking an awkwardly long time to complete. This raised concern among the Release Manager. What if, instead of watching our pipelines, we had some method of alerting us when a job exceeds an average threshold?
With pipelines soon becoming closer to being auto-mated, there's less of a chance that someone will be watching the pipeline, and a larger chance that a job may get hung and only fail after we hit the job timeout. Waiting for that timeout to occur has the potential to introduce a backlog of deploys while we attempt to get the current deploy stabilized. If we are alerted early enough, we can already start troubleshooting the issue.
Would some system that alerts us about jobs running longer than normal be beneficial?