Improved process needed for production TSH sessions during EMEA hours.
The Problem
A standing issue I've experienced repeatedly during my two years at GitLab is attempts to request TSH shell sessions to production during EMEA hours have gone unnoticed until they expired, even when pinging the SRE-ONCALL
. I am not the only one to experience this frustration.
Today 7 different TSH requests were either deleted or left to expire. This is not a good use of developer time and hampers progress in terms of error validation, bug hunting or solution verification. This has a flow-on effect to all aspects of GitLab operation and customer experience.
My direct ping to SRE-ONCALL
has been ignored for over an hour at time of writing, which means even attempting to get the attention of someone who should be watching for alerts is failing.
Solutions?
Past proposed solutions have expressed interest in implementing pre-approvals for certain users, but there have been no updates on the progress of this proposal.
Alternatively, maybe we need to ensure there is SRE-ONCALL
presence during EMEA hours to ensure these requests are reviewed timely.
This fails to align with our value of Efficiency, and directly hampers our ability to provide Results.