Incident issue is in closed state even though the incident is active, potentially causing production checks to be wrong
There are two types of bots, and they are fighting over the Incident::
labels in incident issues once an incident is closed.
When the label Incident::Resolved
is applied by incident.io
, @gitlab-bot
will close the incident issue. It further expects that this incident will not go back to the Incident::Active
state. However, within incident.io
, it is possible to go back to an active state!
When an incident goes from Resolved => Active, the following dance ensues:
-
incident.io
appliesIncident::Active
on a closed issue -
@gitlab-bot
appliesIncident::Resolved
on the same issue immediately after😅
@gitlab-bot
is configured using Triage Ops and the relevant ruels are defined here in production/auto-close.yml
.
This has potential to cause a real problem for release/tools because within our production checks code, we check whether there is an active incident which blocks deployments. This check only looks for "open" incidents:
So, it is possible that the incident lead sets the incident as active from Incident.io, such as the Investigating
state, and selects the incident.io property to block deployments
, but this incident will not be considered a production check blocker because the GitLab incident is closed.
Take the following incident as an example:
https://app.incident.io/gitlab/incidents/518 | production#19691 (closed)
Acceptance criterion
- Only one bot should manage the
Incident::
labels on GitLab incident issues - This bot should re-open incident issues in GitLab when an incident goes from
Resolved
=>Active
state