Project 'gitlab-com/gl-infra/scalability' was moved to 'gitlab-com/gl-infra/observability/team'. Please update any links and bookmarks that may still have the old path.
Set up a Slack channel for Apdex and Error Ratio alerts on Staging
Webcrawler cmbr was enabled on Staging and now Staging has higher level of user activity for Web and API. The goal is to have alerts on Staging similar to what is set up on Production now that activity is higher.
Set up a Slack channel where subset of SLO alerts will be sent for Web, Git and API - Apdex and Error Ratio (similar to #alerts-gen-svc-test)
Update Details in Grafana dashboard (example) to redirect to Production/Staging Kibana depending on environment (could be separated to another issue if it's out of scope) - #1502 (closed)
Thanks @niskhakova! This would be super cool I'll move it to the scalability issue tracker for now so we don't lose track of it. I'll have a quick look later today to see what would be involved. I noticed a lot of work has been done, and we have a #alerts-nonprod, but it seems a bit too silent.
I'm guessing that #alerts-nonprod would be a start, but that would include stuff from other environments than gstg. We can see if we need to split up more once that's working.
@reprazent thanks for taking a look and moving the issue Wasn't sure where to create it. Using #alerts-nonprod sounds great if other environments aren't noisy, my understanding is that it would be good to have a specific alert channel for gstg so that eventually Infra or Quality on-call DRIs can monitor it closely
@reprazent many thanks for setting this up! If I understand correctly this channel has alerts about all SLOs, is it possible to only send alerts for Apdex and Error Ratio for Web, Git and API services?
And also not sure if it's related to this specific issue or it will be better to create a separate one for it - the last checkbox in the issue description:
Update Details in Grafana dashboard (example) to redirect to Production/Staging Kibana depending on environment (could be separated to another issue if it's out of scope)
Please let me know if it will be better to create a new one for this:)
If I understand correctly this channel has alerts about all SLOs, is it possible to only send alerts for Apdex and Error Ratio for Web, Git and API services?
@niskhakova Yes, currently it's all alerts. I can update it to only do this for these 3.
Update Details in Grafana dashboard (example) to redirect to Production/Staging Kibana depending on environment (could be separated to another issue if it's out of scope)
Sorry, I missed that part. That's a bit of a different thing, would you mind creating a new issue and pinging me there?
@reprazent many thanks for the quick turnaround on this! Will keep an eye on this channel for alerts.
@andrewn would it be helpful to create a separate issue for Analyse activity in the new channel and tweak threshold for Staging if it's too quiet - to review the activity and make a decision after some time? And if yes, what would be a good place to create such issue?
@niskhakova yes, I think that would be helpful. I can help by providing some useful queries about which alerts are firing the most in staging, and we can use that to prioritise investigations into the worst offenders.
@reprazent sorry for another ping. Just noticed that in #feed_alerts_staging there are no screenshots like there is in #alerts-gen-svc-test and format is a bit different. Could you please clarify if it would be possible to do something similar? (example in svc) @andrewn what do you think?
Nailia Iskhakovachanged the descriptionCompare with previous version
changed the description
Nailia Iskhakovamarked the checklist item Update Details in Grafana dashboard (example) to redirect to Production/Staging Kibana depending on environment (could be separated to another issue if it's out of scope) - #1502 (closed) as completed
marked the checklist item Update Details in Grafana dashboard (example) to redirect to Production/Staging Kibana depending on environment (could be separated to another issue if it's out of scope) - #1502 (closed) as completed
Nailia Iskhakovamarked the checklist item Set up a Slack channel where subset of SLO alerts will be sent for Web, Git and API - Apdex and Error Ratio (similar to #alerts-gen-svc-test) as completed
marked the checklist item Set up a Slack channel where subset of SLO alerts will be sent for Web, Git and API - Apdex and Error Ratio (similar to #alerts-gen-svc-test) as completed
Nailia Iskhakovachanged the descriptionCompare with previous version