Corrective actions: Child pipelines failing after Feature Flag enabled
A Recent RCA for child pipelines failing after Feature Flag enabled conducted by grouppipeline authoring highlighted a few corrective action items for our team to take.
Incident Review issue (with RCA, 5 Whys and Lessons Learned) can be found here: gitlab-com/gl-infra/production#2700 (closed)
Corrective Actions
-
@darbyfrey To discuss with the Infra team about getting a dedicated SRE counterpart for ~"group::continuous integration" -
@furkanayhan to look into whether we can improve upon our Grafana dashboards and alerting given that we are aware of the error returned -
@cheryl.li To discuss with Product what longer term initiatives are being planned as this was the MVC (to ensure we try to cover aways to prevent this from happening again) -
@cheryl.li To discuss with the team about feature flagging to a certain percentage, but consider excluding enterprise customers from that % rollout
Edited by Cheryl Li