`qa-master` notified failure due to timeout but `gitlab-qa` pipeline passed
The qa-master
received failure notification, but the gitlab-qa
pipeline actually passed. This is because schedule:package-and-qa
job timed out, which could be due to either omnibus-gitlab-mirror
or gitlab-qa
pipeline running longer than expected.
The easiest quick fix is to increase the time out for the schedule:package-and-qa
job based on the average duration of successful jobs that we have in Periscope, but it would add to the overall pipeline duration.
If we were to keep the same timeout, I'm not sure how we can determine if downstream pipelines have completed and passed. We could make the notify job poll downstream pipelines, but it would still be uncertain because the notify job can happen before the downstream pipelines have completed.
/cc @gl-quality/eng-prod thoughts?
Reference from slack:
- https://gitlab.slack.com/archives/CNV2N29DM/p1576087352009400
- https://gitlab.slack.com/archives/CNV2N29DM/p1576737507001400
Todo:
-
Pass through TOP_UPSTREAM_SOURCE_REF
toomnibus-gitlab-mirror
gitlab-org/gitlab!22263 (merged) -
Pass through TOP_UPSTREAM_SOURCE_REF
togitlab-qa
+ notify slack onomnibus-gitlab-mirror
failure gitlab-org/omnibus-gitlab!3823 (merged) -
Notify slack on gitlab-qa
failure gitlab-org/gitlab-qa!361 (merged) -
Clean up notification code in gitlab-org/gitlab
gitlab-org/gitlab!22508 (merged)