Sending an alert to Slack when merge trains on www-gitlab-com are slow
Summary
Recently, we've been dogfooding Merge Trains on www-gitlab-com project. So far, occasionally we've observed that the merge trains were slow due to outstanding bugs, and we basically didn't recognize the problems until some users directly communicated to us on Slack channel.
This issue is to attempt to be a proactive on the incident. If the system has detected that the merge trains on www-gitlab-com are significantly slow, we fire an alert to ~"group::progressive delivery" slack channel.
Proposal
We expose a GET api/v4/merge_trains
public API for getting a list of merge trains. Since we've already been persisting duration per merge train, we can periodically poll this endpoint for checking the health on www-gitlab-com. As a polling service, we're going to create a small script and run it every 20 minutes with a pipeline schedule.
API format
Endpoint | params | description |
---|---|---|
GET api/v4/projects/:id/merge_trains |
scope , sort
|
Get all merge trains of a project |
Reference
Past metrics
We can also run a manual script to collect such data point. e.g. https://docs.google.com/spreadsheets/d/1WN0eFOrVatLvI47ry3t1czom2UnUO2iNKe-rACiNEbI/edit#gid=1349733017