Enable puma fleet-wide on gitlab.com
C3
Production Change - Criticality 3Change Objective | Describe the objective of the change |
---|---|
Change Type | ConfigurationChange |
Services Impacted | rails |
Change Team Members | Name of the engineers involved in the change |
Change Severity | C3 |
Change Reviewer or tested in staging | A colleague who will review the change or evidence the change was tested on staging environment |
Dry-run output | If the change is done through a script, it is mandatory to have a dry-run capability in the script, run the change in dry-run mode and output the result |
Due Date | Date and time in UTC timezone for the execution of the change, if possible add the local timezone of the engineer executing the change |
Time tracking | To estimate and record times associated with changes ( including a possible rollback ) |
Detailed steps for the change
Staging validation
Note: We will be verifying on staging that we can switch to unicorn and back to puma since staging is already running puma
-
Stop chef on all all nodes where unicorn/puma is running
knife ssh 'roles:gstg-base-fe-api OR roles:gstg-base-fe-web OR roles:gstg-base-fe-git' 'sudo service chef-client stop'
-
Switch staging to unicorn (we will take a short interruption on staging for this step, ~2minutes) -
Merge the role update to switch to puma -
Execute a rolling pipeline with haproxy drains to switch nodes to puma -
/chatops run deploycmd chefclient base_fe_web --no-check
-
/chatops run deploycmd chefclient base_fe_api --no-check
-
/chatops run deploycmd chefclient base_fe_git --no-check
-
Production monitoring
It will be extremely important to monitor the fleet before and after this change for any latency degradation
Dashboards
- web apdex: https://dashboards.gitlab.net/d/web-main/web-overview?orgId=1&%3ForgId=1&from=now-1h&to=now
- api apdex: https://dashboards.gitlab.net/d/api-main/api-overview?orgId=1&from=now-1h&to=now
- git apdex: https://dashboards.gitlab.net/d/git-main/git-overview?orgId=1&from=now-1h&to=now
Logs
- 60th and 90th percentile for workhorse durations by type https://log.gprd.gitlab.net/goto/3049cb32319e264dd408c60aab96c2f4
- 60th and 90th percentile for readiness https://log.gprd.gitlab.net/goto/238bb54006600e49c8848c9a756ba712
Production apply
-
Precheck: Confirm that all services are meeting our SLOs on dashboards -
Precheck: From logs, note workhorse 95th and 60th percentile latency duration_ms
for api/web/git and note it here:
-
Stop chef on all all nodes where unicorn/puma is running
knife ssh 'roles:gprd-base-fe-api OR roles:gprd-base-fe-web OR roles:gprd-base-fe-git' 'sudo service chef-client stop'
-
Merge the role update to switch to puma https://ops.gitlab.net/gitlab-cookbooks/chef-repo/-/merge_requests/2688 -
Execute a rolling pipeline with haproxy drains to switch nodes to puma -
/chatops run deploycmd chefclient base_fe_web --production --no-check
-
/chatops run deploycmd chefclient base_fe_api --production --no-check
-
/chatops run deploycmd chefclient base_fe_git --production --no-check
- web https://ops.gitlab.net/gitlab-com/gl-infra/deployer/-/jobs/926420
- api https://ops.gitlab.net/gitlab-com/gl-infra/deployer/-/jobs/926422
- git https://ops.gitlab.net/gitlab-com/gl-infra/deployer/-/jobs/926424
-
-
Postcheck: Confirm that all services are meeting our SLOs on dashboards -
Postcheck: From logs, note workhorse 95th and 60th percentile latency duration_ms
for api/web/git and note it here:- api
- web
- git
-
Remove node overrides on api, web, and git fleet that had puma enabled on individual nodes
Rollback steps
-
Ensure chef is stopped on all nodes
knife ssh 'roles:gprd-base-fe-api OR roles:gprd-base-fe-web OR roles:gprd-base-fe-git' 'sudo service chef-client stop'
-
Rollback MR to disable unicorn, and enable puma https://ops.gitlab.net/gitlab-cookbooks/chef-repo/-/merge_requests/2688 -
Execute a rolling pipeline with haproxy drains to switch nodes to unicorn -
/chatops run deploycmd chefclient base_fe_web --production --no-check
-
/chatops run deploycmd chefclient base_fe_api --production --no-check
-
/chatops run deploycmd chefclient base_fe_git --production --no-check
-
Changes checklist
-
Detailed steps and rollback steps have been filled prior to commencing work -
Person on-call has been informed prior to change being rolled out