Unusually long delays when merging and with pipelines on Staging
Over the last couple of days (from around 2020-06-02 10:28:50 UTC) we've been seeing an increase in QA test failures only on Staging due to very slow merges and pipelines (sometimes minutes longer than usual, sometimes over half an hour).
Examples of failing test jobs:
- Merging not complete - test failed after waiting 90 seconds:
- https://ops.gitlab.net/gitlab-org/quality/staging/-/jobs/1265644
- https://ops.gitlab.net/gitlab-org/quality/staging/-/jobs/1265291
- Here is a merge request that was still being merged over 30 minutes after the merge was started. It eventually stopped trying to merge and looks as if the button was never clicked: (https://staging.gitlab.com/gitlab-qa-sandbox-group/qa-test-2020-06-05-09-30-59-4b8fd10dca89cbc0/group-with-access-to-protected-branch-2326c9f36ca2ea7d/-/merge_requests/1)
- Pipelines pending - test failed after waiting around 5 minutes:
- https://ops.gitlab.net/gitlab-org/quality/staging/-/jobs/1253855
- https://ops.gitlab.net/gitlab-org/quality/staging/-/jobs/1268898
- Here is a pipeline that had a job stuck in pending for over 30 minutes. The test is configured to use its own runner and to only run jobs with a tag matching the unique project name, so no other pipeline should be using the runner: https://staging.gitlab.com/gitlab-qa-sandbox-group/qa-test-2020-06-05-09-30-59-4b8fd10dca89cbc0/upstream-project-6489b6d0fce215ce-703dadc5361c21c2/-/pipelines/12781457/builds
Could there be any past or ongoing use of Staging or problems with the infrastructure that could explain these delays? Possibly:
- production#2231 (closed) (but it's not ongoing?)
- https://gitlab.com/gitlab-com/gl-security/engineering/-/issues/965
Edited by Mark Lapierre