Validate releases after each deployment step
Runways checks the service's 5xx error rate after deployment to 100% of traffic has finished. Instead, we should validate the release after each deployment step and against SLIs, e.g. error ratio.
Also remove the 10m wait between staging and production deployment.
Things to consider
- What kind of query to decide whether deployment should continue/stop?
- There is a 3m delay until the
request_count
metric is available, plus another 1m(?) delay from SD exporter to push the data. Waiting for 4m for each deployment steps would add 16m (0% -> 25%
,25% -> 50%
,50% -> 75%
,75% -> 100%
) to the total deployment time. I think we could do with just validating the first 0% -> 25% step, then continue deployment without delay/monitoring.
Status 2024-07-11
- MR runwayctl!494 (merged) is under review
Edited by Gregorius Marco