Corrective action: The cluster_scaleups SLI of the kube service (main stage) has an error rate violating SLO
Summary
Our deploy rollingUpdate strategy was adjusted to minimize the impact of a shortage of instances available in a single zone in GCP. We should further adjust this change to specifically apply to only C2 based node pools.
Related Incident(s)
Originating issue(s): production#7716 (closed)
Desired Outcome/Acceptance Criteria
Apply the specific rollingUpdate strategy to only services running on C2 instance nodes in GPRD and GSTG.
Associated Services
Corrective Action Issue Checklist
-
Link the incident(s) this corrective action arose out of -
Give context for what problem this corrective action is trying to prevent from re-occurring -
Assign a severity label (this is the highest sev of related incidents, defaults to 'severity::4') -
Assign a priority (this will default to 'Reliability::P4')