`qa-master` notified failure due to timeout but `gitlab-qa` pipeline passed
The qa-master received failure notification, but the gitlab-qa pipeline actually passed. This is because schedule:package-and-qa job timed out, which could be due to either omnibus-gitlab-mirror or gitlab-qa pipeline running longer than expected.
The easiest quick fix is to increase the time out for the schedule:package-and-qa job based on the average duration of successful jobs that we have in Periscope, but it would add to the overall pipeline duration.
If we were to keep the same timeout, I'm not sure how we can determine if downstream pipelines have completed and passed. We could make the notify job poll downstream pipelines, but it would still be uncertain because the notify job can happen before the downstream pipelines have completed.
/cc @gl-quality/eng-prod thoughts?
Reference from slack:
- https://gitlab.slack.com/archives/CNV2N29DM/p1576087352009400
- https://gitlab.slack.com/archives/CNV2N29DM/p1576737507001400
Todo:
-
Pass through TOP_UPSTREAM_SOURCE_REFtoomnibus-gitlab-mirrorgitlab-org/gitlab!22263 (merged) -
Pass through TOP_UPSTREAM_SOURCE_REFtogitlab-qa+ notify slack onomnibus-gitlab-mirrorfailure gitlab-org/omnibus-gitlab!3823 (merged) -
Notify slack on gitlab-qafailure gitlab-org/gitlab-qa!361 (merged) -
Clean up notification code in gitlab-org/gitlabgitlab-org/gitlab!22508 (merged)