2026-01-08: Rollout Ruby 3.3 to Production (#20969) · Issues · GitLab.com / GitLab Infrastructure Team / Production

2026-01-08: Rollout Ruby 3.3 to Production

# Production Change ### Change Summary Following up from the test rollout in staging environments, the plan is to rollout Ruby 3.3 to production. During the change duration, auto-deploys will need to be paused. ### Change Details 1. **Services Impacted** - GitLab Rails, Sidekiq, and any other service that uses Ruby 2. **Change Technician** - `@jennykim-gitlab` `@dat.tang.gitlab` 3. **Change Reviewer** - `@skarbek` 4. **Scheduled Date and Time (UTC in format YYYY-MM-DD HH:MM)** - 2026-01-08 13:00 5. **Time tracking** - 480 minutes 6. **Downtime Component** - none ### Set Maintenance Mode in GitLab No need for setting Maintenance mode. ## Preparation > [!note] > > The following checklists must be done in advance, before setting the label ~"change::scheduled" ### Change Reviewer checklist ~8276990 ~8276981 ~8276978 ~8276976: * [ ] Check if the following applies: * The **scheduled day and time** of execution of the change is appropriate. * The [change plan](#detailed-steps-for-the-change) is technically accurate. * The change plan includes **estimated timing values** based on previous testing. * The change plan includes a viable [rollback plan](#rollback). * The specified [metrics/monitoring dashboards](#key-metrics-to-observe) provide sufficient visibility for the change. ~8276978 ~8276976: * [x] Check if the following applies: * The complexity of the plan is appropriate for the corresponding risk of the change. (i.e. the plan contains clear details). * The change plan includes success measures for all steps/milestones during the execution. * The change adequately minimizes risk within the environment/service. * The performance implications of executing the change are well-understood and documented. * The specified metrics/monitoring dashboards provide sufficient visibility for the change. * If not, is it possible (or necessary) to make changes to observability platforms for added visibility? * The change has a primary and secondary SRE with knowledge of the details available during the change window. * The change window has been agreed with Release Managers in advance of the change. If the change is planned for APAC hours, this issue has an agreed pre-change approval. * The labels ~"blocks deployments" and/or ~"blocks feature-flags" are applied as necessary. ### Change Technician checklist * [x] The [Change Criticality](https://handbook.gitlab.com/handbook/engineering/infrastructure-platforms/change-management/#change-criticalities) has been set appropriately and requirements have been reviewed. * [x] The [change plan](#detailed-steps-for-the-change) is technically accurate. * [x] The [rollback plan](#rollback) is technically accurate and detailed enough to be executed by anyone with access. * [x] This Change Issue is linked to the appropriate Issue and/or Epic * [x] Change has been tested in staging and results noted in a comment on this issue. * [x] A Test rollout to Canary was done https://gitlab.com/gitlab-com/gl-infra/production/-/issues/20918 * [ ] ~~A dry-run has been conducted and results noted in a comment on this issue.~~ * [x] The change execution window respects the [Production Change Lock periods](https://about.gitlab.com/handbook/engineering/infrastructure/change-management/#production-change-lock-pcl). * [x] Once all boxes above are checked, mark the change request as scheduled: `/label ~"change::scheduled"` * [x] For ~8276976 and ~8276978 change issues, the change event is added to the [GitLab Production](https://calendar.google.com/calendar/embed?src=gitlab.com_si2ach70eb1j65cnu040m3alq0%40group.calendar.google.com) calendar by the [change-scheduler bot](https://gitlab.com/gitlab-com/gl-infra/ops-team/toolkit/change-scheduler). It is schedule to run every 2 hours. * [ ] For ~8276976 change issues, a Senior Infrastructure Manager has provided approval with the ~14866676 label on the issue. * [x] For ~8276978 change issues, an Infrastructure Manager provided approval with the ~14866676 label on the issue. * [x] For ~8276976 and ~8276978 changes, mention `@gitlab-org/saas-platforms/inframanagers` in this issue to provide visibility to all infrastructure managers. * [x] For ~8276976, ~8276978, or ~"blocks deployments" change issues, confirm with Release managers that the change does not overlap or hinder any release process (In `#production` channel, mention `@release-managers` and this issue and await their acknowledgment.) * [x] Mention that no PDM should be run on the day of the CR ## Detailed steps for the change ### Pre-execution steps > [!note] > > The following steps should be done right at the scheduled time of the change request. The [preparation steps](#preparation) are listed below. * [x] Make sure all tasks in [Change Technician checklist](#change-technician-checklist) are done * [x] For ~8276976 and ~8276978 change issues, the SRE on-call has been informed prior to change being rolled out. * [x] The SRE on-call provided approval with the ~25771657 label on the issue. * [x] For ~8276976, ~8276978, or ~"blocks deployments" change issues, Release managers have been informed prior to change being rolled out. (In `#production` channel, mention `@release-managers` and this issue and await their acknowledgment.) * [x] Mention that no PDM should be run on the day of the CR * [x] There are currently no [active incidents](https://gitlab.com/gitlab-com/gl-infra/production/-/issues/?sort=created_date&state=opened&label_name%5B%5D=Incident%3A%3AActive&or%5Blabel_name%5D%5B%5D=severity%3A%3A1&or%5Blabel_name%5D%5B%5D=severity%3A%3A2&first_page_size=20) that are ~3760139 or ~3760140 * [ ] If the change involves doing maintenance on a database host, an appropriate silence targeting the host(s) should be added for the duration of the change. ### Change Steps - steps to take to execute the change * [x] No PDM should be run on the day of the CR * [x] Note the last deployment package that was successfully deployed to production in the [Rollback](#rollback) section. * [x] Open aMR to update README with the new ruby version: https://gitlab.com/gitlab-org/gitlab/-/merge_requests/218123 _Estimated Time to Complete (mins)_ - 480 minutes (8 hours) - [x] Set label ~"change::in-progress" `/label ~change::in-progress` #### Pause auto deploys - [x] Pause auto deploys: `/chatops run auto_deploy pause` #### Prepare a Ruby 3.3 package - [x] Inform the `@sre-on-call` and `@release-manager` in `#production`, and the engineers in the `#f_ruby3` channel in Slack that we are starting building a Ruby 3.3 package to deploy to production. - [x] Set `USE_NEXT_RUBY_VERSION_IN_AUTODEPLOY` to `true` in the following projects: - [x] Omnibus-gitlab: https://dev.gitlab.org/gitlab/omnibus-gitlab/-/settings/ci_cd - [x] CNG: https://dev.gitlab.org/gitlab/charts/components/images/-/settings/ci_cd - [x] Merge MR to update README with the new ruby version https://gitlab.com/gitlab-org/gitlab/-/merge_requests/218123 #### Start a deployment pipeline - [x] Trigger a deployment pipeline by running the "MANUAL auto-deploy pick&tag" inactive manual scheduled pipeline: https://ops.gitlab.net/gitlab-org/release/tools/-/pipeline_schedules/ - [x] Make sure a new auto deploy pipeline has started. * [x] https://dev.gitlab.org/gitlab/omnibus-gitlab/-/commits/18.8.202601081342+fb807874e7c.5c7075ed575 - [x] Verify that the Omnibus package contains Ruby 3.3. - [x] Find out the image reference of the `Docker` job in the Omnibus packager pipeline. You can search for the string `Copying image` in the logs for that (under the collapsible section `docker-push-staging`) ``` ─ docker run -it dev.gitlab.org:5005/gitlab/omnibus-gitlab/gitlab-ee:18.8.202601081342-fb807874e7c.5c7075ed575-arm64 ruby --version ruby 3.3.10 (2025-10-23 revision 343ea05002) [aarch64-linux] ``` - [x] Verify that the CNG image contains Ruby 3.3. * [x] Run the following command locally to know the bundled Ruby version - `docker run -it dev.gitlab.org:5005/gitlab/charts/components/images/gitlab-webservice-ee:<IMAGE_REFERENCE> -- ruby --version` ``` ╰─ docker run -it dev.gitlab.org:5005/gitlab/charts/components/images/gitlab-webservice-ee:18-8-202601081342-fb807874e7c -- ruby --version WARNING: image platform (linux/amd64) does not match the expected platform (linux/arm64) Begin parsing .erb templates from /srv/gitlab/config Begin parsing .tpl templates from /srv/gitlab/config ruby 3.3.10 (2025-10-23 revision 343ea05002) [x86_64-linux] ``` #### Deploy the Ruby 3.3 Package - [x] Start a new deployment pipeline for the above package with `/chatops run auto_deploy pipeline 18.8.202601081342-fb807874e7c.5c7075ed575` * [x] https://ops.gitlab.net/gitlab-org/release/tools/-/pipelines/5368534 - [x] Ping monitoring engineers and `@release-managers` on `#f_ruby3` channel in Slack when package is deployed to `gstg-cny`. - [ ] Check [monitoring](#monitoring) - [ ] Make sure Quality smoke and reliable pipelines on gstg-cny have passed. If there are failures, ask the Quality on-call to have a look to determine if the failures are related to the Ruby 3.3 rollout. - [ ] Let deployment continue upto `gprd-cny`. - [ ] Ping monitoring engineers and `@release-managers` on `#f_ruby3` channel in Slack when package is deployed to `gprd-cny`. - [ ] Check [monitoring](#monitoring) - [ ] Make sure Quality smoke and reliable pipelines on gprd-cny have passed. If there are failures, ask the Quality on-call to have a look to determine if the failures are related to the Ruby 3.3 rollout. - [ ] Promote the package to `gstg` once the monitoring engineers give green light. - [ ] Keep an eye for any `gprd` deployment jobs and cancel them. - [ ] Ping monitoring engineers and `@release-managers` on `#f_ruby3` channel in Slack when package is deployed to `gstg`. - [ ] Check [monitoring](#monitoring) ##### Deploy to Production We will manually deploy to the zonal clusters manually, then the regional cluster. Bake to account for monitoring time in-between each cluster. - [ ] Set `MANUAL_GPRD_DEPLOY` to `true` in https://ops.gitlab.net/gitlab-com/gl-infra/k8s-workloads/gitlab-com/-/settings/ci_cd - [ ] Cancel `notify_success:gprd` job to not accidentally announce a successful deploy during baking time of manual jobs - [ ] Restart the previously cancelled `gprd` deployment job(s) to start deployment to production ###### Zonal cluster b - [ ] Manually run `gprd-us-east1-b:auto-deploy` - [ ] Ping monitoring engineers and `@release-managers` on `#f_ruby3` channel in Slack when package is deployed to the zonal cluster `gprd-us-east1-b`. - [ ] Check [monitoring](#monitoring). Remember to set the zone to `b` - [ ] Bake for 30 minutes, or until engineers give green light. ###### Zonal cluster c - [ ] Manually run `gprd-us-east1-c:auto-deploy` - [ ] Ping monitoring engineers and `@release-managers` on `#f_ruby3` channel in Slack when package is deployed to the zonal cluster `gprd-us-east1-c`. - [ ] Check [monitoring](#monitoring). Remember to set the zone to `c` - [ ] Bake for 30 minutes, or until engineers give green light. ###### Zonal cluster d - [ ] Manually run `gprd-us-east1-d:auto-deploy` - [ ] Ping monitoring engineers `#f_ruby3` channel in Slack when package is deployed to the zonal cluster `gprd-us-east1-d`. - [ ] Check [monitoring](#monitoring). Remember to set the zone to `d` - [ ] Bake for 15 minutes, or until engineers give green light. ###### Regional cluster - [ ] Manually run `gprd:auto-deploy` - [ ] Inform the `@sre-on-call` and `@release-managers` in `#production` on Slack and the monitoring engineers in the `#f_ruby3` channel in Slack when the package is deployed to `gprd`. - [ ] Check [monitoring](#monitoring) ###### Post-deploy - [ ] Restart the previously cancelled `notify_success:gprd` job to notify the successful deployment to `gprd` on Slack `#announcements` channel - [ ] Remove `MANUAL_GPRD_DEPLOY` in https://ops.gitlab.net/gitlab-com/gl-infra/k8s-workloads/gitlab-com/-/settings/ci_cd - [ ] Do not execute post-deploy migrations for the rest of the day (EMEA and AMER) beyond this point. #### Unpause auto deploys - [ ] `/chatops run auto_deploy unpause` - [ ] Set label ~"change::complete" `/label ~change::complete` ## Rollback ### Last Ruby 3.2 package that was successfully deployed to production 18.8.202601080636-1333cdaf7f7.aa9ef7f5e4d ### Rollback steps - steps to be taken in the event of a need to rollback this change _Estimated Time to Complete (mins)_ - 60 minutes #### Rollback production-canary only If you have not promoted to production and need to rollback production-canary, follow the following steps: * [ ] Notify `@sre-on-call`, `@release-managers` in `#production` that Production Canary will be drained. * [ ] `/chatops run canary --disable --production` * [ ] Follow the steps in [Make sure that the next auto deploy package will be built with Ruby 3.2](#make-sure-that-the-next-auto-deploy-package-will-be-built-with-ruby-32) #### Rollback production and staging If we need to rollback production and staging, follow the steps in https://gitlab.com/gitlab-org/release/docs/-/blob/master/runbooks/rollback-a-deployment.md to rollback to a Ruby 3.2 package. The steps are reproduced here as well: * [ ] `/chatops run rollback check gprd` * [ ] Notify `@sre-on-call`, `@release-managers` in `#production` that a rollback is about to be started. Make sure they know that Canary will also be drained. * [ ] `/chatops run canary --disable --production` * [ ] `/chatops run deploy <PACKAGE NAME> gprd --rollback` * [ ] `/chatops run rollback check gstg` * [ ] Notify `@sre-on-call`, `@release-managers` in `#staging` that a rollback is about to be started. Make sure they know that Canary will also be drained. * [ ] `/chatops run canary --disable --staging` * [ ] `/chatops run deploy <PACKAGE NAME> gstg --rollback` #### Make sure that the next auto deploy package will be built with Ruby 3.2 * [x] Set `USE_NEXT_RUBY_VERSION_IN_AUTODEPLOY` to `false` in https://dev.gitlab.org/gitlab/omnibus-gitlab/-/settings/ci_cd. * [x] Set `USE_NEXT_RUBY_VERSION_IN_AUTODEPLOY` to `false` in https://dev.gitlab.org/gitlab/charts/components/images/-/settings/ci_cd. * [x] If you had already unpaused auto-deploys, cancel any auto-deploy pipelines whose packages were built before you changed the `USE_NEXT_RUBY_VERSION_IN_AUTODEPLOY` variable to `false`. * [x] Revert MR to update README; https://gitlab.com/gitlab-org/gitlab/-/merge_requests/218205 * [x] Set label ~"change::aborted" `/label ~change::aborted` ## Monitoring ### Key metrics to observe * Dashboards/metrics: * Monitor the following dashboards for unhealthy dip in service health for the environment/cluster that is being rolled out. * [Deployment health](https://dashboards.gitlab.net/d/delivery-deployment_health/delivery-deployment-health?orgId=1), configurable with environment, stage, and type/service * [Kubernetes compute resource/cluster health](https://dashboards.gitlab.net/d/kubernetes-resources-cluster/kubernetes-compute-resources-cluster?orgId=1&refresh=5m), configurable with clusters * [Kubernetes compute resource/pods health](https://dashboards.gitlab.net/d/kubernetes-resources-namespace/kubernetes-compute-resources-namespace-pods?orgId=1&refresh=5m), configurable with clusters and namespace * [Kubernetes networking](https://dashboards.gitlab.net/d/kubernetes-cluster-total/kubernetes-networking-cluster?orgId=1&refresh=5m), configurable with clusters * Per-service dashboards (change `env` and `stage` to toggle between `gstg`/`gprd` and `main`/`cny`): * `api` ([overview](https://dashboards.gitlab.net/d/api-main/api-overview?orgId=1&from=now-1h&to=now), [containers](https://dashboards.gitlab.net/d/api-kube-containers/api3a-kube-containers-detail?from=now-1h&to=now&var-PROMETHEUS_DS=mimir-gitlab-gprd&var-environment=gprd&var-stage=cny&orgId=1)) * `web` ([overview](https://dashboards.gitlab.net/d/web-main/web-overview?orgId=1&from=now-1h&to=now), [containers](https://dashboards.gitlab.net/d/web-kube-containers/web3a-kube-containers-detail?from=now-1h&to=now&var-PROMETHEUS_DS=mimir-gitlab-gprd&var-environment=gprd&var-stage=cny&orgId=1)) * `websockets` ([overview](https://dashboards.gitlab.net/d/websockets-main/websockets-overview?orgId=1&from=now-1h&to=now), [containers](https://dashboards.gitlab.net/d/websockets-kube-containers/websockets3a-kube-containers-detail?from=now-1h&to=now&var-PROMETHEUS_DS=mimir-gitlab-gprd&var-environment=gprd&var-stage=cny&orgId=1)) * `git` ([overview](https://dashboards.gitlab.net/d/git-main/git-overview?orgId=1&from=now-1h&to=now), [containers](https://dashboards.gitlab.net/d/git-kube-containers/git3a-kube-containers-detail?from=now-1h&to=now&var-PROMETHEUS_DS=mimir-gitlab-gprd&var-environment=gprd&var-stage=cny&orgId=1)) * `sidekiq` ([overview](https://dashboards.gitlab.net/d/sidekiq-main/sidekiq-overview?orgId=1&from=now-1h&to=now), [containers](https://dashboards.gitlab.net/d/sidekiq-kube-containers/sidekiq3a-kube-containers-detail?from=now-1h&to=now&var-PROMETHEUS_DS=Global&var-environment=gprd&var-stage=main&orgId=1)) * Kibana - Puma (edit `json.type` to filter by service, `json.stage` for `cny` vs `main`) * [Production 5xx responses](https://log.gprd.gitlab.net/goto/e0d9a290-b8c9-11ed-85ed-e7557b0a598c) * [Staging 5xx responses](https://nonprod-log.gitlab.net/goto/040ba510-b8ca-11ed-9af2-6131f0ee4ce6) * Kibana - Sidekiq (edit `json.shard` to switch between job types) * [Failed production jobs](https://log.gprd.gitlab.net/goto/89320700-b813-11ed-9f43-e3784d7fe3ca) * [Failed staging jobs](https://nonprod-log.gitlab.net/goto/e2744200-b814-11ed-9af2-6131f0ee4ce6) * Sentry * [Production overview](https://sentry.gitlab.net/gitlab/gitlabcom/dashboard/?statsPeriod=1h) * [Staging overview](https://sentry.gitlab.net/gitlab/staginggitlabcom/dashboard/?statsPeriod=1h) * QA runs can be observed via Slack: * `#announcements` - Besides QA messages, multiple messages are sent to this channel to account for the different deployments. * QA slack channels - There is a channel per environment, for example, a failure on gstg and gstg-cny will be posted in `#qa-staging`, a failure on gprd-cny and gprd will be posted in `#qa-production`, etc. * Dealing with deploy failures: https://gitlab.com/gitlab-org/release/docs/-/blob/master/general/deploy/failures.md

issue