Core team onboarding course - Improve the firefighting training
There are a number of recurring oversights done by many new core team members - improve the section of the core team member onboarding course about firefighting (and the handbook if necessary) to improve the following points:
- Registering work hours in the OpsGenie rotation during sprints where one is a firefighter (add at least coverage for the volume of hours from your commitments, spread each day you are working during the 2 weeks of the sprint -- not just the week written in the rotation calendar)
- Snooze, don't ack -- but assign the alert to oneself, and monitor for escalation
- Always have the pager able to interrupt when on pager time (phone in pocket when on vibration, or within audible radius)
- Acknowledge with client immediately, and keep them informed regularly (keep an eye on the emails coming from them), and ask them for final confirmation that all is fixed for them before closing the alert.
- Remove urgent@opencraft.com from replies, to avoid paging repeatedly for a single issue
- Train clients who misuse the pager -- it should only be used in the cases specified in the SLA
- Treat false positive as an issue anyway -- the work scope to resolve it is to fix what created the false positive (which are dangerous too, as they make us ignore alerts more easily, when the % of false positive increases)
Also schedule follow-up tasks:
- One for each core team member, to review the updated section of the training, and ensure we are all on the same page
- One follow-up task for the assignee of the current task, scheduled after the next new core team member does their first firefighting rotation, to check that the points above were correctly understood and applied without additional reminder. If not, improve the course further, and repeat the process until fixed.
Edited by Xavier Antoviaque