Pipeline Triage Report 2020-09-07 to 2020-09-11

DRI

Please review the responsiblities and guidelines if you have not done so recently.

Above all else, please remember that the aim of pipeline triage is to identify problems and try to get them resolved (ideally) before they impact users.

You can view the Testcases project list of issues for non-quarantined test results to help identify failures and see if any were missed when reviewing pipelines.

	EMEA	AMER	APAC
Primary	@willmeek	@ebanks	-
Secondary	-	@niskhakova	@sliaquat

Issues carried over from last week

clone_push_pull_project_snippet_spec gitlab-org/gitlab#239091 (closed)
group_audit_logs_1_spec gitlab-org/gitlab#243680 (comment 405012958)
locked_artifacts_spec gitlab-org/gitlab#240911 (closed)
ee:kubernetes https://gitlab.com/gitlab-com/gl-security/security-operations/sirt/operations/-/issues/1060
pipeline_status_on_operation_dashboard_spec gitlab-org/gitlab#233300 (closed)

Summary

There were a lot of issues related to register_spec gitlab-org/gitlab#247743 (closed) which has been fixed

Highlights

Staging ☑

A number of transient individual failures, though last full run (3.40pm UTC https://ops.gitlab.net/gitlab-org/quality/staging/-/pipelines/260538 ) was green ✅

Two HTTP 500 in different jobs needed ♻

Staging Orchestrated ❌

When register_spec fix makes it to here it should go green, other than transient failures

Pre-Production ✅

Production ✅

Nightly ❌

Last nights nightly had a lot of snippet errors as per gitlab-org/gitlab#245081 (closed), the ticket was updated to alert those working on it

Master ☑

When register_spec fix merge gets ran it should hopefully go green, other than transient failures

Remaining Issues

Snippets issues especially in Nightly - gitlab-org/gitlab#245081 (closed)

Geo Staging has issue on rename_replication_spec - gitlab-org/gitlab#247128 (closed) Jennie is aware

Keep an eye on Staging for HTTP 500s, had to retry a couple of failed jobs.

Also keep an eye on reporting failures, Mark has put a lot of good work into hardening the retries but on Thursday there were occasional HTTP 500s and timeouts.

Edited Sep 11, 2020 by Will Meek

Admin message