Sending an alert to Slack when merge trains on www-gitlab-com are slow
Recently, we've been dogfooding Merge Trains on www-gitlab-com project. So far, occasionally we've observed that the merge trains were slow due to outstanding bugs, and we basically didn't recognize the problems until some users directly communicated to us on Slack channel.
This issue is to attempt to be a proactive on the incident. If the system has detected that the merge trains on www-gitlab-com are significantly slow, we fire an alert to groupprogressive delivery slack channel.
We expose a
GET api/v4/merge_trains public API for getting a list of merge trains. Since we've already been persisting duration per merge train, we can periodically poll this endpoint for checking the health on www-gitlab-com. As a polling service, we're going to create a small script and run it every 20 minutes with a pipeline schedule.
||Get all merge trains of a project|
We can also run a manual script to collect such data point. e.g. https://docs.google.com/spreadsheets/d/1WN0eFOrVatLvI47ry3t1czom2UnUO2iNKe-rACiNEbI/edit#gid=1349733017