Skip to content

Remove the tmp shared filesystem mounts on staging/canary/production

Production Change - Criticality 3 C3

Change Objective Removes the shared storage mounts on staging
Change Type Type described above
Services Impacted front-end
Change Team Members @jarv oncall
Change Severity C3
Buddy check or tested in staging TBD
Schedule of the change Date and time (with timezone)
Duration of the change Time to execute the change ( including a possible rollback )
Detailed steps for the change. Each step must include: - pre-conditions for execution of the step, - execution commands for the step, - post-execution validation for the step , - rollback of the step

Overview

This change removes the 2 of the 6 share mounts we have on the front-end and back-end fleet:

  • KEEP /var/opt/gitlab/gitlab-ci/builds
  • REMOVE /var/opt/gitlab/gitlab-rails/shared/tmp
  • KEEP /var/opt/gitlab/gitlab-rails/shared/cache gitlab-org/gitlab#39496 (closed)
  • KEEP /var/opt/gitlab/gitlab-rails/shared/artifacts
  • KEEP /var/opt/gitlab/gitlab-rails/shared/lfs-objects
  • KEEP /var/opt/gitlab/gitlab-rails/uploads

Note: ci_enable_live_trace, is not enabled for all projects at the moment. Since we do not have a good solution for this we will not be unmounting the builds mounte

artifacts, lfs-objects and uploads will be removed at a later time.

These directories are centrally mounted on a single shared server which is a current SPOF. While having these filesystems shared across front and backend nodes is not necessary, we are currently writing to these locations since the directories are used as a scratch space for object storage upload.

We will be unmounting the tmp and cache directories across the fleet, 10% at a time using the following Ansible play https://ops.gitlab.net/gitlab-com/gl-infra/deploy-tooling/blob/master/cmds/removeshare.yml . Nodes are removed from the load balancer, gitlab services are stopped, the mounts are unmounted, and instances are added back to the LB.

To phase this change in slowly, we will first be removing the mounts on Staging, then canary nodes, then finally the rest of the fleet.

Staging - remove tmp and cache

Friday Dec. 6: Production preparation

Friday Dec. 6: Sidekiq export

  • Run the removeshare deployer command for
    • CURRENT_DEPLOY_ENVIRONMENT=gprd-cny
    • GITLAB_ROLE=gprd-base-be-sidekiq-export

Tuesday Dec. 10: Canary

  • Run the removeshare deployer command for
    • CURRENT_DEPLOY_ENVIRONMENT=gprd-cny
    • GITLAB_ROLE=fe-api-cny / GITLAB_ROLE=fe-web-cny / GITLAB_ROLE=fe-git-cny

Tuesday Dec. 17: Production backend

  • Run the removeshare deployer command for
    • CURRENT_DEPLOY_ENVIRONMENT=gprd
    • GITLAB_ROLE=base-be

Friday Dec. 20: Production frontend

  • Run the removeshare deployer command for
    • CURRENT_DEPLOY_ENVIRONMENT=gprd
    • GITLAB_ROLE=base-fe-git
  • Run the removeshare deployer command for
    • CURRENT_DEPLOY_ENVIRONMENT=gprd
    • GITLAB_ROLE=base-fe-api
  • Run the removeshare deployer command for
    • CURRENT_DEPLOY_ENVIRONMENT=gprd
    • GITLAB_ROLE=base-fe-web

Rollback

knife ssh 'roles:gprd-fe' 'sudo chef-client'
knife ssh 'roles:gprd-be' 'sudo chef-client'
Edited by John Jarvis