Q3FY22 Release Group Reliability Focus
This issue is to discuss work we plan to do to help put GitLab.com in a better availability and reliability posture. It is part of gitlab-com/Product#2881 (closed)
Guidance
- PM: Understand and share what are the minimal-must-do features/capabilities we should deliver to meet customer ARR commitments from 14.2 through 14.6
- EM: Understand and share What are the topmost things you could advocate for to improve the reliability of your areas. In addition to improving reliability, consider observability needs to detect and fix reliability issues when they do occur in SaaS
Must deliver features between 14.2 and 14.6
- Re-design of Environment page so we can iterate on top of it - design issue is here
- Manual approve/deny deployments
- Clean up work from Pages migration (gitlab-org/gitlab-pages#382 (closed), gitlab-org/gitlab-pages#561 (closed), gitlab-org/gitlab#330317 (closed), gitlab-org/gitlab#291069 (closed))
Top reliability improvement issues
- Environments Tech Debt
- Decide if deployments/environments should be in separate DB - this one may spin out additional work
- Fix
Environment#stop_actions
doing cross-joins - Fix orphaned ci_build_id in pages_deployments
- Revisit batch loading process to properly load associated rows
- Add ability to cleanup Environments/Deployments
- Pages reliability/operation/security improvements (https://gitlab.com/gitlab-org/gitlab-pages/-/issues/574, gitlab-org/gitlab-pages#588, https://gitlab.com/gitlab-org/gitlab-pages/-/issues/490, gitlab-org/gitlab#244304, https://gitlab.com/gitlab-org/gitlab/-/issues/244301)
What can't we do if we do all of the above?
- Work that brings Environment to Viable - see gitlab-org&3293 (closed)
Recommendation
We should do everything that in must deliver features and top reliability improvement issues in Q3. Many of the issues around environments/deployments are anchors that prevent us from making progress quickly.
Edited by Kevin Chu