Remove the tmp shared filesystem mounts on staging/canary/production
Production Change - Criticality 3 C3
| Change Objective | Removes the shared storage mounts on staging |
|---|---|
| Change Type | Type described above |
| Services Impacted | front-end |
| Change Team Members | @jarv oncall |
| Change Severity | C3 |
| Buddy check or tested in staging | TBD |
| Schedule of the change | Date and time (with timezone) |
| Duration of the change | Time to execute the change ( including a possible rollback ) |
| Detailed steps for the change. Each step must include: | - pre-conditions for execution of the step, - execution commands for the step, - post-execution validation for the step , - rollback of the step |
Overview
This change removes the 2 of the 6 share mounts we have on the front-end and back-end fleet:
- KEEP /var/opt/gitlab/gitlab-ci/builds
- REMOVE /var/opt/gitlab/gitlab-rails/shared/tmp
- KEEP /var/opt/gitlab/gitlab-rails/shared/cache gitlab-org/gitlab#39496 (closed)
- KEEP /var/opt/gitlab/gitlab-rails/shared/artifacts
- KEEP /var/opt/gitlab/gitlab-rails/shared/lfs-objects
- KEEP /var/opt/gitlab/gitlab-rails/uploads
Note: ci_enable_live_trace, is not enabled for all projects at the moment. Since we do not have a good solution for this we will not be unmounting the builds mounte
artifacts, lfs-objects and uploads will be removed at a later time.
These directories are centrally mounted on a single shared server which is a current SPOF. While having these filesystems shared across front and backend nodes is not necessary, we are currently writing to these locations since the directories are used as a scratch space for object storage upload.
We will be unmounting the tmp and cache directories across the fleet, 10% at a time using the following Ansible play https://ops.gitlab.net/gitlab-com/gl-infra/deploy-tooling/blob/master/cmds/removeshare.yml . Nodes are removed from the load balancer, gitlab services are stopped, the mounts are unmounted, and instances are added back to the LB.
To phase this change in slowly, we will first be removing the mounts on Staging, then canary nodes, then finally the rest of the fleet.
Staging - remove tmp and cache
-
Merge https://ops.gitlab.net/gitlab-cookbooks/chef-repo/merge_requests/2294 -
Ensure that there are no longer any shared mounts in staging on share-01-stor-gprd.c.gitlab-production.internal for tmpandcache
Friday Dec. 6: Production preparation
-
Merge and apply https://ops.gitlab.net/gitlab-cookbooks/chef-repo/merge_requests/2295 so that tmp and cache are not forced mounted in chef runs
Friday Dec. 6: Sidekiq export
-
Run the removesharedeployer command for- CURRENT_DEPLOY_ENVIRONMENT=gprd-cny
- GITLAB_ROLE=gprd-base-be-sidekiq-export
Tuesday Dec. 10: Canary
-
Run the removesharedeployer command for- CURRENT_DEPLOY_ENVIRONMENT=gprd-cny
- GITLAB_ROLE=fe-api-cny / GITLAB_ROLE=fe-web-cny / GITLAB_ROLE=fe-git-cny
Tuesday Dec. 17: Production backend
-
Run the removesharedeployer command for- CURRENT_DEPLOY_ENVIRONMENT=gprd
- GITLAB_ROLE=base-be
Friday Dec. 20: Production frontend
-
Run the removesharedeployer command for- CURRENT_DEPLOY_ENVIRONMENT=gprd
- GITLAB_ROLE=base-fe-git
-
Run the removesharedeployer command for- CURRENT_DEPLOY_ENVIRONMENT=gprd
- GITLAB_ROLE=base-fe-api
-
Run the removesharedeployer command for- CURRENT_DEPLOY_ENVIRONMENT=gprd
- GITLAB_ROLE=base-fe-web
Rollback
-
Revert https://ops.gitlab.net/gitlab-cookbooks/chef-repo/merge_requests/2295 -
Apply chef across front and backend
knife ssh 'roles:gprd-fe' 'sudo chef-client'
knife ssh 'roles:gprd-be' 'sudo chef-client'