Link Gitlab-managed Prometheus Alerts and Issues
Gitlab already offers the ability for Issues to be created as the result of a Prometheus Alert firing.
This is great but there are some downsides, such as:
- No de-duplication. Each time an alert fires we'll create an Issue, possibly leading to duplication.
- We do not properly track which alert caused which issue. This makes it hard to automatically close issues when alerts get resolved.
Issues that will need the ability to identify unique alerts:
- Close GitLab issue on Recovery alerts from Prometheus #13401 (comment 219919219)
- De-duplication of Prometheus alerts for Incidents #25950 (closed)
How do we define alert uniqueness?
Current behaviour: We use the gitlab_alert_id
+ the time the alert fired. This does't work as an alert can keep firing, leading to many issues of the same thing.
This is the backbone of how we can achieve the above. We have a few options:
-
groupKey
. This is used by Prometheus to group alerts together. We have seen groupKeys of"{}:{}"
which give me the impression it's not a good use for uniqueness. -
gitlab_alert_id
. This is an ID that we pass to Prometheus when setting up alerts. This only exists for managed prometheus. -
alertname
. This is the name/title of the metric that is failing i.e (Memory Usage (Total)
) -
generatorURL
. This is a "unique unique back-link which identifies the causing entity of this alert in the client."
We could potentially use a combination of these?
Self-hosted Prometheus?
Currently self-hosted Prometheus instances can send alerts to Gitlab and get issues created, but they do not get tracked in a Prometheus Alert Event, and so we have no way to do de-dup or auto resolution currently.
For self-hosted prometheus, users setup their on alerts so this means:
- We have no
PrometheusAlert
object saved in Gitlab - therefore, we also won't receive a
gitlab_alert_id
for alerts that get triggered
Options:
- Maybe we should consider saving PrometheusAlertEvents even if there is no PrometheusAlert associated to them? This could be in it's own model class with different validations. It would allow us to say the
payload_key
which we could then use to de-dup etc.
I think if we take this option we could potentially use the following as the uniqueness keys (payload_key):
- Managed Prometheus: github_alert_id + group_key + title
- Self managed: group_key + title + generatorURL
Here are the Prometheus docs on the fields they offer: https://prometheus.io/docs/alerting/notifications/
@splattael @syasonik @ck3g Let me know what you think?
There is a WIP MR to resolve this here: !17477 (merged)
Here is an example group of errors I produced locally:
{
"receiver": "gitlab",
"status": "firing",
"alerts": [
{
"status": "firing",
"labels": {
"alertname": "Memory Usage (Pod average)",
"gitlab": "hook",
"gitlab_alert_id": "17"
},
"annotations": {},
"startsAt": "2019-09-18T03:52:31.071551255Z",
"endsAt": "2019-09-18T03:59:31.071551255Z",
"generatorURL": "http://prometheus-prometheus-server-646888949c-nllv2:9090/graph?g0.expr=avg+without%28job%29+%28sum+by%28job%29+%28container_memory_usage_bytes%7Bcontainer_name%21%3D%22POD%22%2Cnamespace%3D%22autodevops-deploy-19-production%22%2Cpod_name%3D~%22%5Eproduction-%28%5B%5Ec%5D.%2A%7Cc%28%5B%5Ea%5D%7Ca%28%5B%5En%5D%7Cn%28%5B%5Ea%5D%7Ca%28%5B%5Er%5D%7Cr%5B%5Ey%5D%29%29%29%29.%2A%7C%29-%28.%2A%29%22%7D%29%29+%2F+count%28avg+without%28job%29+%28container_memory_usage_bytes%7Bcontainer_name%21%3D%22POD%22%2Cnamespace%3D%22autodevops-deploy-19-production%22%2Cpod_name%3D~%22%5Eproduction-%28%5B%5Ec%5D.%2A%7Cc%28%5B%5Ea%5D%7Ca%28%5B%5En%5D%7Cn%28%5B%5Ea%5D%7Ca%28%5B%5Er%5D%7Cr%5B%5Ey%5D%29%29%29%29.%2A%7C%29-%28.%2A%29%22%7D%29%29+%2F+1024+%2F+1024+%3E+0&g0.tab=1"
},
{
"status": "firing",
"labels": {
"alertname": "Memory Usage (Total)",
"gitlab": "hook",
"gitlab_alert_id": "15"
},
"annotations": {},
"startsAt": "2019-09-18T03:55:31.071551255Z",
"endsAt": "2019-09-18T04:00:31.071551255Z",
"generatorURL": "http://prometheus-prometheus-server-646888949c-nllv2:9090/graph?g0.expr=avg+without%28job%29+%28sum+by%28job%29+%28container_memory_usage_bytes%7Bcontainer_name%21%3D%22POD%22%2Cnamespace%3D%22autodevops-deploy-19-production%22%2Cpod_name%3D~%22%5Eproduction-%28.%2A%29%22%7D%29%29+%2F+1024+%2F+1024+%2F+1024+%3E+0.0001&g0.tab=1"
}
],
"groupLabels": {},
"commonLabels": {
"gitlab": "hook"
},
"commonAnnotations": {},
"externalURL": "",
"version": "4",
"groupKey": "{}:{}",
"namespace_id": "root",
"project_id": "autodevops-deploy",
"alert": {
"receiver": "gitlab",
"status": "firing",
"alerts": [
{
"status": "firing",
"labels": {
"alertname": "Memory Usage (Pod average)",
"gitlab": "hook",
"gitlab_alert_id": "17"
},
"annotations": {},
"startsAt": "2019-09-18T03:52:31.071551255Z",
"endsAt": "2019-09-18T03:59:31.071551255Z",
"generatorURL": "http://prometheus-prometheus-server-646888949c-nllv2:9090/graph?g0.expr=avg+without%28job%29+%28sum+by%28job%29+%28container_memory_usage_bytes%7Bcontainer_name%21%3D%22POD%22%2Cnamespace%3D%22autodevops-deploy-19-production%22%2Cpod_name%3D~%22%5Eproduction-%28%5B%5Ec%5D.%2A%7Cc%28%5B%5Ea%5D%7Ca%28%5B%5En%5D%7Cn%28%5B%5Ea%5D%7Ca%28%5B%5Er%5D%7Cr%5B%5Ey%5D%29%29%29%29.%2A%7C%29-%28.%2A%29%22%7D%29%29+%2F+count%28avg+without%28job%29+%28container_memory_usage_bytes%7Bcontainer_name%21%3D%22POD%22%2Cnamespace%3D%22autodevops-deploy-19-production%22%2Cpod_name%3D~%22%5Eproduction-%28%5B%5Ec%5D.%2A%7Cc%28%5B%5Ea%5D%7Ca%28%5B%5En%5D%7Cn%28%5B%5Ea%5D%7Ca%28%5B%5Er%5D%7Cr%5B%5Ey%5D%29%29%29%29.%2A%7C%29-%28.%2A%29%22%7D%29%29+%2F+1024+%2F+1024+%3E+0&g0.tab=1"
},
{
"status": "firing",
"labels": {
"alertname": "Memory Usage (Total)",
"gitlab": "hook",
"gitlab_alert_id": "15"
},
"annotations": {},
"startsAt": "2019-09-18T03:55:31.071551255Z",
"endsAt": "2019-09-18T04:00:31.071551255Z",
"generatorURL": "http://prometheus-prometheus-server-646888949c-nllv2:9090/graph?g0.expr=avg+without%28job%29+%28sum+by%28job%29+%28container_memory_usage_bytes%7Bcontainer_name%21%3D%22POD%22%2Cnamespace%3D%22autodevops-deploy-19-production%22%2Cpod_name%3D~%22%5Eproduction-%28.%2A%29%22%7D%29%29+%2F+1024+%2F+1024+%2F+1024+%3E+0.0001&g0.tab=1"
}
],
"groupLabels": {},
"commonLabels": {
"gitlab": "hook"
},
"commonAnnotations": {},
"externalURL": "",
"version": "4",
"groupKey": "{}:{}"
}
}