Explore using incident timelines for GitLab.com incident management
Summary
With gitlab-org/gitlab!97020 (merged) about to be deployed on production I think we are ready to move fully to incident timelines for .com incident management.
-
Add timeline event when incident is declared (woodhouse) Replaces timeline with quick action (woodhouse!198 - merged) -
Add timeline events when notable events happen like severity changes (woodhouse webhook config)We won't do this since we turned on label tracking in the timeline -
Update the default template to remove the existing timeline section with a pointer to the timeline feature Removes the timeline section in favor of incide... (production!156 - merged)
Original description
We currently use free-form text to fill in an incident timeline on GitLab incidents using the issue template
In mstaff#110 (comment 986621174) it was proposed that we might want consider using the Incident timeline feature. Also see a quick demo of how it works.
The main obstacle to adopting this feature is that often we need to backfill multiple timeline events, and doing this through the UI will probably be a bit cumbersome. We also programmatically need to add a timeline event, specifically the first one which is who declared the incident since the issue is opened using the API (via Slack) by a bot user.
One possible solution to this is to add a slash command would make using the timeline feature feasible as a replacement for our free-form Timeline description, by both allowing us to create a timeline item programmatically by sending the slash command when we create the issue using the API and also adding timeline items in bulk, example:
/incident_timeline add "@someuser updates feature flag to true" 2022-06-10 09:30
/incident_timeline add "status page updated"
/incident_timeline add "new deploy to resolve issue" 11:00
/incident_timeline add "deploy finished" 12:30
I think the main things we would want to consider:
- Smart parsing of the timestamp, by default we use the current time, or the current day unless specified
- We sometimes fill in timelines on behalf of other people, we could put the user in the timeline text but maybe you could override the username associated with the event somehow
- Having a slash command is probably good enough as an API substitute, for example when an incident is updated we execute a webhook that could fill in the timeline automatically in some cases, by issuing a note with the slash command.