Corrective actions: Child pipelines failing after Feature Flag enabled

A Recent RCA for child pipelines failing after Feature Flag enabled conducted by grouppipeline authoring highlighted a few corrective action items for our team to take.

Incident Review issue (with RCA, 5 Whys and Lessons Learned) can be found here: gitlab-com/gl-infra/production#2700 (closed)

Corrective Actions

  • @darbyfrey To discuss with the Infra team about getting a dedicated SRE counterpart for ~"group::continuous integration"
  • @furkanayhan to look into whether we can improve upon our Grafana dashboards and alerting given that we are aware of the error returned
  • @cheryl.li To discuss with Product what longer term initiatives are being planned as this was the MVC (to ensure we try to cover aways to prevent this from happening again)
  • @cheryl.li To discuss with the team about feature flagging to a certain percentage, but consider excluding enterprise customers from that % rollout
Edited by Cheryl Li