Separate L&R from CEOC role
Request for comments
Need
(This will likely be rather region-specific, as the CEOC rotation in each region are a fair bit different. My perspective is the EMEA rotation, but I'm curious to hear anyone's thoughts. Maybe that can inform other CEOC rotation changes as well.)
Especially during the week incoming L&R emergencies are frequently handled by pinging an L&R expert who will take over the case entirely, or achieving the same outcome by reaching out in #support_licensing-subscription
. This currently makes the CEOC role into a bit of a hybrid:
- Dispatcher who forwards L&R cases
- Technical emergency support for self-managed incidents
- (I know those two don't cover 100%, but it's close enough.)
For me personally, handling situations with concurrent pages is probably the most stressful part of being On-Call. "Dispatching" an L&R case can still take a fair bit of time, and as such increases the chance of such a situation arising. Additionally, handing these cases off like we have been lately feels a bit odd, because it's not quite the documented process. It works fine, it feels somewhat efficient… but I don't feel "off the hook" for it until I know it will be dealt with.
Some recent examples:
- https://gitlab.slack.com/archives/C4Y5DRKLK/p1689941530854639
- https://gitlab.slack.com/archives/C018C623KBJ/p1685700409743039
- https://gitlab.slack.com/archives/C4Y5DRKLK/p1689000301933769
- https://gitlab.slack.com/archives/C4Y5DRKLK/p1688550383229669
- https://gitlab.slack.com/archives/C4Y5DRKLK/p1688459404399559
Approach
I don't have a specific suggestion, but I do think we're currently lacking clarity here. I would like to gather more thoughts on the topic so we can get more clarity. I will outline some of my thoughts/ideas in the Alternatives section below.
Benefit
Added clarity will:
- Make us more efficient
- Reduce psychological stress of On-Call
- Produce better outcomes for customers
Competition / Alternatives
- First and foremost: If everyone but me thinks "The heck is he talking about", we definitely can just do nothing and keep going. In that case I can adjust my perception of things.
- Reinforce the currently documented process, train current CEOC rotation members on it, make sure we start following it more strictly again. This would mostly mean: Stop pinging individual SEs on L&R "emergencies" and close them out instead. We should still inform the L&R team – and might want to be more specific in the handbook as to how we do that step.
- Maybe we want to keep doing it how we're doing it now. In that case, we should think it through a bit more and then update the documented process accordingly.
- If we do want to formalize this, does that mean that L&R needs a rotation to ensure someone is always available for the process?
- Bigger picture: While I said "it feels somewhat efficient" above, in another sense it also feels inefficient. If all the CEOC does is forwarding it to L&R who then handles it, can we not remove that step from the process? The challenge here is that customers will still want just one address to reach out to for emergencies, license related or not – this makes it hard to get rid of the "dispatcher" aspect.
- I would argue that the CEOC should not be a dispatcher. Dealing with a technical incident is stressful and can take very long. Having to dispatch "irrelevant" things "on the side" does not make for a great experience for everyone involved.
- Do we need a separate "Dispatcher" rotation? That seems overkill.