Investigation followup: Changes occurring during Soft PCL
Summary
This is an investigation issue in relation to these two incidents which both occurred on July 22, 2022:
-
2022-07-22 Redis Cache CPU saturated
- caused by this planned C3 change: production#7479 (closed)
-
2022-07-22 Flappy Redis failovers impacting web performance
- currently believed to be related to [some currently undefined] problem in a deploy which happened today.
Both of the above changes appear to be in conflict with the Soft PCL policy for release days. It appears that we aren't observing our PCL policy. Adherence to PCL should not rely on individual team member memory.
Do we have the appropriate automation in place to remind team members of PCL times and/or support team members through automated checks/gates to prevent changes which don't meet criteria?
Related Incident(s)
Special Attn: @amyphillips @mbursi