Revise token related feature development process to include a rollout plan as a mandatory requirement
As a result of Incident Review: InactiveTokensDeletionCronWorker (gitlab-com/gl-infra/production#18549 - closed) due to the critical nature of tokens, groupauthentication will update the development process to require a rollout plan for any changes that affect user data for tokens (& associated bots, membership data etc). The rollout plan can use one of more mechanisms to ensure that mitigations are in place to prevent and limit the impact from any unknowns or bugs. Some of the considerations are:
- Use of feature flags focused on groups or actions
- Extending FF to workers or cron jobs such that they can support a narrow scoped rollout.
- Considering the use of soft deletion or transitionary data in DB to evaluate how a change will look like in production with easy rollback opportunity
- Use of internal groups such as quality team or delivery teams which are production systems, but still internal. This would test the feature at a production level, ideally with no negative impact, but still offer a layer of user testing prior to users.
- Test coverage and implementation reviews are useful tools, however they are limited by the perspective/domain knowledge of the author/reviewers. By considering the above approaches, we want to emphasize on the unknown/unknowns that can really be tested with production users.