Stop Deployment on error rate threshold
## Background We already monitor activity on our deployments, it would be really great if we could take action on metrics that do not meet the desired threshold such as stopping the deployment rollout, and even auto rollback. Connecting our ~"devops::monitor" insights to ~"Category:Continuous Delivery" could benefit our users greatly. ## User journey 1. User defines metrics to collect (exists already) 2. User defines the threshold to receive alerts (exists already) 3. User defines the threshold to receive triggers to pipeline (can be API) 4. Pipeline receives trigger to stop rollout 5. Deploy board indicates rollout halted 6. User can get information on why the rollout halted - (maybe a link to the monitor dashboard) 7. User is offered Rollback or Continue rollout options (probably buttons) 8. Action is logged in an audit log An example for such metrics are in case of canary deployment, a user would be interested in measuring CPU Usage, Max CPU Usage, Max Disk Usage. In case our new deployment exceeds one of these metrics, the deployment stops. For the MVC we will define a simple over/under threshold trigger. In the future we can explore the option to set a clear fail, clear pass, and a grey area which is over the min threshold and under the max threshold and requires a manual decision as seen in Spinnaker: ![image](/uploads/5b2c186dd98f50d0534614281e6a3a37/image.png) Bonus Feature: Once https://gitlab.com/groups/gitlab-org/-/epics/2225 is complete, a user can triage why the metrics are not satisfactory.
epic