Sessions: Plan migration strategy
General overview
We need to figure out how to migrate users to a new sessions store without causing any inconveniences from UX perspective and also reserving the ability to fall back to the "old" strategy if something would go wrong with the changes / new instance.
Migration Requirements
- No downtime (obviously)
- We can't log out everyone, at least at the same time. Ideally, we don't want to lose any sessions for our active users (last action < 1w ago) or at all.
- Partial rollout using Feature Flags or ENV vars or combinations of both.
- we can't use a typical actor-based Feature Flag logic: it's a chicken/egg problem, as we identify the actor by the session itself.
- Monitoring of the switch - we need to have Prometheus metrics ready + maybe dedicated dashboards to follow the process
- Easy rollback without downtime in case the new instance/logic would be misbehaving.
Possible strategies + Fallback options
Option 1
It may be somehow similar to zero-downtime DB table rename.
We start writing data into both Redis instances (main + session) for new sessions.
When updating the session, we recreate it in a new instance.
For older sessions, we either migrate them or just wait for an expiry period - thus, all "active" sessions would be similar in a ~week (default session expiry period unless customized).
After that, we may "replace" their role (could be controlled via ENV var) - and start fetching from a dedicated instance, duplicating it on the main one, in case of a full fallback.
Fallback/incident strategy
Notify on issues/exceptions with a new instance, but don't 500.
Fall back to the main instance when pre-fetching from the dedicated session Redis failed.
Option - 2
Mentioned by Sean/Igor. Make the replication work for us. Have a new instance acting as a replica for our main Redis instance, keep it for a week (session length), then promote it as master for sessions.
Conclusion: We will go with option 1. gitlab-org/gitlab!72630 (closed)