Pipeline Triage Report 2020-09-07 to 2020-09-11
DRI
Please review the responsiblities and guidelines if you have not done so recently.
Above all else, please remember that the aim of pipeline triage is to identify problems and try to get them resolved (ideally) before they impact users.
You can view the Testcases project list of issues for non-quarantined test results to help identify failures and see if any were missed when reviewing pipelines.
EMEA | AMER | APAC | |
---|---|---|---|
Primary | @willmeek | @ebanks | - |
Secondary | - | @niskhakova | @sliaquat |
Issues carried over from last week
-
clone_push_pull_project_snippet_spec
gitlab-org/gitlab#239091 (closed) -
group_audit_logs_1_spec
gitlab-org/gitlab#243680 (comment 405012958) -
locked_artifacts_spec
gitlab-org/gitlab#240911 (closed) - ee:kubernetes https://gitlab.com/gitlab-com/gl-security/security-operations/sirt/operations/-/issues/1060
-
pipeline_status_on_operation_dashboard_spec
gitlab-org/gitlab#233300 (closed)
Summary
There were a lot of issues related to register_spec
gitlab-org/gitlab#247743 (closed) which has been fixed
Highlights
☑
Staging A number of transient individual failures, though last full run (3.40pm UTC https://ops.gitlab.net/gitlab-org/quality/staging/-/pipelines/260538 ) was green
Two HTTP 500 in different jobs needed
❌
Staging Orchestrated When register_spec
fix makes it to here it should go green, other than transient failures
✅
Pre-Production
✅
Production
❌
Nightly Last nights nightly had a lot of snippet errors as per gitlab-org/gitlab#245081 (closed), the ticket was updated to alert those working on it
☑
Master When register_spec
fix merge gets ran it should hopefully go green, other than transient failures
Remaining Issues
Snippets issues especially in Nightly - gitlab-org/gitlab#245081 (closed)
Geo Staging has issue on rename_replication_spec
- gitlab-org/gitlab#247128 (closed) Jennie is aware
Keep an eye on Staging for HTTP 500s, had to retry a couple of failed jobs.
Also keep an eye on reporting failures, Mark has put a lot of good work into hardening the retries but on Thursday there were occasional HTTP 500s and timeouts.