Investigate mixed environment test process to make early issue detection after DB migrations / before deployments
Summary
Customers were not able to access groups/projects from their top-level namespaces via both web and API. 500 errors were experienced.
This was due to a database migration creating incompatibilities between staging and production servers, however our mixed environment did not detect this incompatibility until the migration reached gprd-cny.
Ultimately, we need to discover why the mixed environment tests did not fail when executed in gstg-cny/gstg tests and correct that issue.
Related Incident(s)
Originating issue(s): gitlab-com/gl-infra/production/-/issues/14470
Desired Outcome/Acceptance Criteria
-
Determine if coverage is appropriate for this incident -
Determine the differences in deployment to gsdtg-cny/gstg vs gprd-cny/gprd to discover if the deployment order may create a coverage hole -
Determine if our mixed environment testing is functioning correctly itself -
Based on the root cause, provide an appropriate fix/fixes
Corrective Action Issue Checklist
-
Link the incident(s) this corrective action arose out of -
Give context for what problem this corrective action is trying to prevent from re-occurring -
Assign a severity label (this is the highest sev of related incidents, defaults to 'severity::4') -
@ mention the QEM of the respective section (Quality DRI under the appropriate section as seen here )
Edited by Zeff Morgan