Create memory-bound sidekiq shard Kubernetes deployment
Production Change - Criticality 3 C3
| Change Objective | Describe the objective of the change |
|---|---|
| Change Type | ConfigurationChange|HotFix|DeploymentNewFeature|Operation |
| Services Impacted | sidekiq / k8 |
| Change Team Members | @jarv |
| Change Criticality | C3 |
| Change Reviewer or tested in staging | A colleague who will review the change or evidence the change was tested on staging environment |
| Dry-run output | If the change is done through a script, it is mandatory to have a dry-run capability in the script, run the change in dry-run mode and output the result |
| Due Date | Date and time in UTC timezone for the execution of the change, if possible add the local timezone of the engineer executing the change |
| Time tracking | To estimate and record times associated with changes ( including a possible rollback ) |
Overview
This change migrates production to sidekiq-cluster and uses the memory-bound queue group in place of specifying the project-export queue in the k8s configuration. This has already been done on preprod and staging where project export has been verified.
Detailed steps for the change
-
Create the node pool in production for memory-bound sidekiq workloads: https://ops.gitlab.net/gitlab-com/gitlab-com-infrastructure/-/merge_requests/1688 -
Switch to sidekiq-cluster using the sidekiq-queue selector: gitlab-com/gl-infra/k8s-workloads/gitlab-com!207 (merged) -
Apply the change in production -
Remove the extraneous pool from terraform and clean up tf state - Remove the legacy node pool
- Update terraform state and import the newly created pool
tf state rm module.gitlab-gke.google_container_node_pool.node_pool[1]
tf state rm module.gitlab-gke.google_container_node_pool.node_pool[2]
tf import module.gitlab-gke.google_container_node_pool.node_pool[1] gitlab-production/us-east1/gprd-gitlab-gke/sidekiq-memory-bound-0
-
Ensure a clean terraform run -
Update the relabel in monitoring for the new deployment name
Rollback steps
-
Revert and apply gitlab-com/gl-infra/k8s-workloads/gitlab-com!207 (merged)
Changes checklist
-
Detailed steps and rollback steps have been filled prior to commencing work -
SRE on-call has been informed prior to change being rolled out -
There are currently no open issues labeled as ServiceMonitoring with severities of ~S1 or ~S2
Edited by John Jarvis