Skip to content

Corrective action: The cluster_scaleups SLI of the kube service (main stage) has an error rate violating SLO

Summary

Our deploy rollingUpdate strategy was adjusted to minimize the impact of a shortage of instances available in a single zone in GCP. We should further adjust this change to specifically apply to only C2 based node pools.

Related Incident(s)

Originating issue(s): production#7716

Desired Outcome/Acceptance Criteria

Apply the specific rollingUpdate strategy to only services running on C2 instance nodes in GPRD and GSTG.

Associated Services

Corrective Action Issue Checklist

  • Link the incident(s) this corrective action arose out of
  • Give context for what problem this corrective action is trying to prevent from re-occurring
  • Assign a severity label (this is the highest sev of related incidents, defaults to 'severity::4')
  • Assign a priority (this will default to 'Reliability::P4')