2026-01-22: Rollout Ruby 3.3 to Production
# Production Change
### Change Summary
> [!note]
>
> This is the second attempt, after the first one was aborted https://gitlab.com/gitlab-com/gl-infra/production/-/work_items/20969
Following up from the test rollout in staging environments, the plan is to rollout Ruby 3.3 to production.
During the change duration, auto-deploys will need to be paused.
### Change Details
1. **Services Impacted** - GitLab Rails, Sidekiq, and any other service that uses Ruby
2. **Change Technician** - `@jennykim-gitlab` `@dat.tang.gitlab`
3. **Change Reviewer** - `@skarbek`
4. **Scheduled Date and Time (UTC in format YYYY-MM-DD HH:MM)** - 2026-01-22 12:00
5. **Time tracking** - 480 minutes
6. **Downtime Component** - none
### Set Maintenance Mode in GitLab
No need for setting Maintenance mode.
## Preparation
> [!note]
>
> The following checklists must be done in advance, before setting the label ~"change::scheduled"
### Change Reviewer checklist
~8276990 ~8276981 ~8276978 ~8276976:
* [x] Check if the following applies:
* The **scheduled day and time** of execution of the change is appropriate.
* The [change plan](#detailed-steps-for-the-change) is technically accurate.
* The change plan includes **estimated timing values** based on previous testing.
* The change plan includes a viable [rollback plan](#rollback).
* The specified [metrics/monitoring dashboards](#key-metrics-to-observe) provide sufficient visibility for the change.
~8276978 ~8276976:
* [x] Check if the following applies:
* The complexity of the plan is appropriate for the corresponding risk of the change. (i.e. the plan contains clear details).
* The change plan includes success measures for all steps/milestones during the execution.
* The change adequately minimizes risk within the environment/service.
* The performance implications of executing the change are well-understood and documented.
* The specified metrics/monitoring dashboards provide sufficient visibility for the change.
* If not, is it possible (or necessary) to make changes to observability platforms for added visibility?
* The change has a primary and secondary SRE with knowledge of the details available during the change window.
* The change window has been agreed with Release Managers in advance of the change. If the change is planned for APAC hours, this issue has an agreed pre-change approval.
* The labels ~"blocks deployments" and/or ~"blocks feature-flags" are applied as necessary.
### Change Technician checklist
* [x] The [Change Criticality](https://handbook.gitlab.com/handbook/engineering/infrastructure-platforms/change-management/#change-criticalities) has been set appropriately and requirements have been reviewed.
* [x] The [change plan](#detailed-steps-for-the-change) is technically accurate.
* [x] The [rollback plan](#rollback) is technically accurate and detailed enough to be executed by anyone with access.
* [x] This Change Issue is linked to the appropriate Issue and/or Epic
* [x] Change has been tested in staging and results noted in a comment on this issue.
* [x] A Test rollout to Canary was done https://gitlab.com/gitlab-com/gl-infra/production/-/issues/20918
* [ ] ~~A dry-run has been conducted and results noted in a comment on this issue.~~
* [x] The change execution window respects the [Production Change Lock periods](https://about.gitlab.com/handbook/engineering/infrastructure/change-management/#production-change-lock-pcl).
* [x] Once all boxes above are checked, mark the change request as scheduled: `/label ~"change::scheduled"`
* [x] For ~8276976 and ~8276978 change issues, the change event is added to the [GitLab Production](https://calendar.google.com/calendar/embed?src=gitlab.com_si2ach70eb1j65cnu040m3alq0%40group.calendar.google.com) calendar by the [change-scheduler bot](https://gitlab.com/gitlab-com/gl-infra/ops-team/toolkit/change-scheduler). It is schedule to run every 2 hours.
* [ ] \~\~For ~8276976 change issues, a Senior Infrastructure Manager has provided approval with the ~14866676 label on the issue.\~\~
* [x] For ~8276978 change issues, an Infrastructure Manager provided approval with the ~14866676 label on the issue.
* [x] For ~8276976 and ~8276978 changes, mention `@gitlab-org/saas-platforms/inframanagers` in this issue to provide visibility to all infrastructure managers.
* [x] For ~8276976, ~8276978, or ~"blocks deployments" change issues, confirm with Release managers that the change does not overlap or hinder any release process (In [`#production`](https://gitlab.enterprise.slack.com/archives/C101F3796) channel, mention `@release-managers` and this issue and await their acknowledgment.)
* [x] Mention that no PDM should be run on the day of the CR
## Detailed steps for the change
### Pre-execution steps
> [!note]
>
> The following steps should be done right at the scheduled time of the change request. The [preparation steps](#preparation) are listed below.
* [x] Make sure all tasks in [Change Technician checklist](#change-technician-checklist) are done
* [x] For ~8276976 and ~8276978 change issues, the SRE on-call has been informed prior to change being rolled out.
* [x] The SRE on-call provided approval with the ~25771657 label on the issue.
* [x] For ~8276976, ~8276978, or ~"blocks deployments" change issues, Release managers have been informed prior to change being rolled out. (In [`#production`](https://gitlab.enterprise.slack.com/archives/C101F3796) channel, mention `@release-managers` and this issue and await their acknowledgment.)
* [x] Mention that no PDM should be run on the day of the CR
* [x] There are currently no [active incidents](https://gitlab.com/gitlab-com/gl-infra/production/-/issues/?sort=created_date&state=opened&label_name%5B%5D=Incident%3A%3AActive&or%5Blabel_name%5D%5B%5D=severity%3A%3A1&or%5Blabel_name%5D%5B%5D=severity%3A%3A2&first_page_size=20) that are ~3760139 or ~3760140
* [ ] If the change involves doing maintenance on a database host, an appropriate silence targeting the host(s) should be added for the duration of the change.
### Change Steps - steps to take to execute the change
* [x] No PDM should be run on the day of the CR
* [x] Note the last deployment package that was successfully deployed to production in the [Rollback](#rollback) section.
* [x] Open a MR to update README with the new ruby version: https://gitlab.com/gitlab-org/gitlab/-/merge_requests/218245
_Estimated Time to Complete (mins)_ - 480 minutes (8 hours)
- [x] Set label ~"change::in-progress" `/label ~change::in-progress`
#### Pause auto deploys
- [x] Pause auto deploys: `/chatops run auto_deploy pause`
- [x] Pause auto build: `/chatops run auto_build pause`
#### Prepare a Ruby 3.3 package
- [x] Inform the `@sre-on-call` and `@release-manager` in [`#production`](https://gitlab.enterprise.slack.com/archives/C101F3796), and the engineers in the `#f_ruby3` channel in Slack that we are starting building a Ruby 3.3 package to deploy to production.
- [x] Inform [`#engineering-fyi`](https://gitlab.enterprise.slack.com/archives/CJWA4E9UG) :
```
We are about to roll out Ruby 3.3 to Production (GitLab.com). You can follow the progress in the Change Request <link>
```
- [x] Set `USE_NEXT_RUBY_VERSION_IN_AUTODEPLOY` to `true` in the following projects:
- [x] Omnibus-gitlab: https://dev.gitlab.org/gitlab/omnibus-gitlab/-/settings/ci_cd
- [x] CNG: https://dev.gitlab.org/gitlab/charts/components/images/-/settings/ci_cd
- [x] Merge MR to update README with the new ruby version: https://gitlab.com/gitlab-org/gitlab/-/merge_requests/218245
#### Start a deployment pipeline
- [x] Trigger a build pipeline by running the "auto_build: Create new package" inactive manual scheduled pipeline: https://ops.gitlab.net/gitlab-org/release/tools/-/pipeline_schedules/
- [x] Make sure new build pipelines have started.
* [x] Omnibus Pipeline URL: https://dev.gitlab.org/gitlab/omnibus-gitlab/-/pipelines/420528
* [x] CNG Pipeline URL: https://dev.gitlab.org/gitlab/charts/components/images/-/pipelines/420529
- [x] Verify that the Omnibus package contains Ruby 3.3.
* [x] Find out the image reference of the `Docker` job in the Omnibus packager pipeline. You can search for the string `Copying image` in the logs for that (under the collapsible section `docker-push-staging`) - `docker run -it dev.gitlab.org:5005/gitlab/omnibus-gitlab/gitlab-ee:<image-tag> ruby --version`
```
╰─ docker run -it dev.gitlab.org:5005/gitlab/omnibus-gitlab/gitlab-ee:18.9.202601221307-3640a66f21b.6453831a365-arm64 ruby --version
Trying to pull dev.gitlab.org:5005/gitlab/omnibus-gitlab/gitlab-ee:18.9.202601221307-3640a66f21b.6453831a365-arm64...
Getting image source signatures
...
Writing manifest to image destination
ruby 3.3.10 (2025-10-23 revision 343ea05002) [aarch64-linux]
```
- [x] Verify that the CNG image contains Ruby 3.3.
* [x] Run the following command locally to know the bundled Ruby version - `docker run -it dev.gitlab.org:5005/gitlab/charts/components/images/gitlab-webservice-ee:<IMAGE_REFERENCE> -- ruby --version`
```
╰─ docker run -it dev.gitlab.org:5005/gitlab/charts/components/images/gitlab-webservice-ee:18-9-202601221307-3640a66f21b -- ruby --version
...
Writing manifest to image destination
WARNING: image platform (linux/amd64) does not match the expected platform (linux/arm64)
Begin parsing .erb templates from /srv/gitlab/config
Begin parsing .tpl templates from /srv/gitlab/config
ruby 3.3.10 (2025-10-23 revision 343ea05002) [x86_64-linux]
```
#### Deploy the Ruby 3.3 Package
- [x] Start a new deployment pipeline for the above package with `/chatops run auto_deploy pipeline 18.9.202601221307-3640a66f21b.6453831a365`
* [x] Deployment pipeline URL: https://ops.gitlab.net/gitlab-org/release/tools/-/pipelines/5411920
- [x] Ping monitoring engineers and `@release-managers` on `#f_ruby3` channel in Slack when package is deployed to `gstg-cny`.
- [x] Check [monitoring](#monitoring)
- [x] Make sure Quality smoke and reliable pipelines on gstg-cny have passed. If there are failures, ask the Quality on-call to have a look to determine if the failures are related to the Ruby 3.3 rollout.
- [x] Let deployment continue upto `gprd-cny`.
- [x] Ping monitoring engineers and `@release-managers` on `#f_ruby3` channel in Slack when package is deployed to `gprd-cny`.
- [x] Check [monitoring](#monitoring)
- [x] Make sure Quality smoke and reliable pipelines on gprd-cny have passed. If there are failures, ask the Quality on-call to have a look to determine if the failures are related to the Ruby 3.3 rollout.
- [x] Promote the package to `gstg` once the monitoring engineers give green light.
- [x] Keep an eye for any `gprd` deployment jobs and cancel them.
- [x] Ping monitoring engineers and `@release-managers` on `#f_ruby3` channel in Slack when package is deployed to `gstg`.
- [x] Check [monitoring](#monitoring)
##### Deploy to Production
We will manually deploy to the zonal clusters manually, then the regional cluster. Bake to account for monitoring time in-between each cluster.
- [x] Set `MANUAL_GPRD_DEPLOY` to `true` in https://ops.gitlab.net/gitlab-com/gl-infra/k8s-workloads/gitlab-com/-/settings/ci_cd
- [x] Cancel `notify_success:gprd` job to not accidentally announce a successful deploy during baking time of manual jobs
- [x] Restart the previously cancelled `gprd` deployment job(s) to start deployment to production
###### Zonal cluster b
- [x] Manually run `gprd-us-east1-b:auto-deploy`
- [x] Ping monitoring engineers and `@release-managers` on `#f_ruby3` channel in Slack when package is deployed to the zonal cluster `gprd-us-east1-b`.
- [x] Check [monitoring](#monitoring). Remember to set the zone to `b`
- [x] Bake for 30 minutes, or until engineers give green light.
###### Zonal cluster c
- [x] Manually run `gprd-us-east1-c:auto-deploy`
- [x] Ping monitoring engineers and `@release-managers` on `#f_ruby3` channel in Slack when package is deployed to the zonal cluster `gprd-us-east1-c`.
- [x] Check [monitoring](#monitoring). Remember to set the zone to `c`
- [x] Bake for 30 minutes, or until engineers give green light.
###### Zonal cluster d
- [x] Manually run `gprd-us-east1-d:auto-deploy`
- [x] Ping monitoring engineers `#f_ruby3` channel in Slack when package is deployed to the zonal cluster `gprd-us-east1-d`.
- [x] Check [monitoring](#monitoring). Remember to set the zone to `d`
- [x] Bake for 15 minutes, or until engineers give green light.
###### Regional cluster
- [x] Manually run `gprd:auto-deploy`
- [x] Inform the `@sre-on-call` and `@release-managers` in [`#production`](https://gitlab.enterprise.slack.com/archives/C101F3796) on Slack and the monitoring engineers in the `#f_ruby3` channel in Slack when the package is deployed to `gprd`.
- [x] Check [monitoring](#monitoring)
###### Post-deploy
- [x] Restart the previously cancelled `notify_success:gprd` job to notify the successful deployment to `gprd` on Slack `#announcements` channel
- [x] Remove `MANUAL_GPRD_DEPLOY` in https://ops.gitlab.net/gitlab-com/gl-infra/k8s-workloads/gitlab-com/-/settings/ci_cd
- [x] Do not execute post-deploy migrations for the rest of the day (EMEA and AMER) beyond this point.
#### Unpause auto deploys
- [x] `/chatops run auto_deploy unpause`
- [x] `/chatops run auto_build unpause`
- [x] Set label ~"change::complete" `/label ~change::complete`
## Rollback
### Last Ruby 3.2 package that was successfully deployed to production
18.9.202601220506-8c59070ce5b.927dc39fcfc
### Rollback steps - steps to be taken in the event of a need to rollback this change
_Estimated Time to Complete (mins)_ - 60 minutes
- [ ] Inform [`#engineering-fyi`](https://gitlab.enterprise.slack.com/archives/CJWA4E9UG) :
```
The Ruby 3.3 rollout was aborted due to <reason>. It will be rescheduled for the near future.
```
#### Rollback production-canary only
If you have not promoted to production and need to rollback production-canary, follow the following steps:
* [ ] Notify `@sre-on-call`, `@release-managers` in [`#production`](https://gitlab.enterprise.slack.com/archives/C101F3796) that Production Canary will be drained.
* [ ] `/chatops run canary --disable --production`
* [ ] Follow the steps in [Make sure that the next auto deploy package will be built with Ruby 3.2](#make-sure-that-the-next-auto-deploy-package-will-be-built-with-ruby-32)
#### Rollback production and staging
If we need to rollback production and staging, follow the steps in https://gitlab.com/gitlab-org/release/docs/-/blob/master/runbooks/rollback-a-deployment.md to rollback to a Ruby 3.2 package. The steps are reproduced here as well:
* [ ] `/chatops run rollback check gprd`
* [ ] Notify `@sre-on-call`, `@release-managers` in [`#production`](https://gitlab.enterprise.slack.com/archives/C101F3796) that a rollback is about to be started. Make sure they know that Canary will also be drained.
* [ ] `/chatops run canary --disable --production`
* [ ] `/chatops run deploy <PACKAGE NAME> gprd --rollback`
* [ ] `/chatops run rollback check gstg`
* [ ] Notify `@sre-on-call`, `@release-managers` in `#staging` that a rollback is about to be started. Make sure they know that Canary will also be drained.
* [ ] `/chatops run canary --disable --staging`
* [ ] `/chatops run deploy <PACKAGE NAME> gstg --rollback`
#### Make sure that the next auto deploy package will be built with Ruby 3.2
* [ ] Set `USE_NEXT_RUBY_VERSION_IN_AUTODEPLOY` to `false` in https://dev.gitlab.org/gitlab/omnibus-gitlab/-/settings/ci_cd.
* [ ] Set `USE_NEXT_RUBY_VERSION_IN_AUTODEPLOY` to `false` in https://dev.gitlab.org/gitlab/charts/components/images/-/settings/ci_cd.
* [ ] If you had already unpaused auto-deploys, cancel any auto-deploy pipelines whose packages were built before you changed the `USE_NEXT_RUBY_VERSION_IN_AUTODEPLOY` variable to `false`.
* [ ] Revert MR to update README
* [ ] Set label ~"change::aborted" `/label ~change::aborted`
## Monitoring
### Key metrics to observe
* Dashboards/metrics:
* Monitor the following dashboards for unhealthy dip in service health for the environment/cluster that is being rolled out.
* [Deployment health](https://dashboards.gitlab.net/d/delivery-deployment_health/delivery-deployment-health?orgId=1), configurable with environment, stage, and type/service
* [Kubernetes compute resource/cluster health](https://dashboards.gitlab.net/d/kubernetes-resources-cluster/kubernetes-compute-resources-cluster?orgId=1&refresh=5m), configurable with clusters
* [Kubernetes compute resource/pods health](https://dashboards.gitlab.net/d/kubernetes-resources-namespace/kubernetes-compute-resources-namespace-pods?orgId=1&refresh=5m), configurable with clusters and namespace
* [Kubernetes networking](https://dashboards.gitlab.net/d/kubernetes-cluster-total/kubernetes-networking-cluster?orgId=1&refresh=5m), configurable with clusters
* Per-service dashboards (change `env` and `stage` to toggle between `gstg`/`gprd` and `main`/`cny`):
* `api` ([overview](https://dashboards.gitlab.net/d/api-main/api-overview?orgId=1&from=now-1h&to=now), [containers](https://dashboards.gitlab.net/d/api-kube-containers/api3a-kube-containers-detail?from=now-1h&to=now&var-PROMETHEUS_DS=mimir-gitlab-gprd&var-environment=gprd&var-stage=cny&orgId=1))
* `web` ([overview](https://dashboards.gitlab.net/d/web-main/web-overview?orgId=1&from=now-1h&to=now), [containers](https://dashboards.gitlab.net/d/web-kube-containers/web3a-kube-containers-detail?from=now-1h&to=now&var-PROMETHEUS_DS=mimir-gitlab-gprd&var-environment=gprd&var-stage=cny&orgId=1))
* `websockets` ([overview](https://dashboards.gitlab.net/d/websockets-main/websockets-overview?orgId=1&from=now-1h&to=now), [containers](https://dashboards.gitlab.net/d/websockets-kube-containers/websockets3a-kube-containers-detail?from=now-1h&to=now&var-PROMETHEUS_DS=mimir-gitlab-gprd&var-environment=gprd&var-stage=cny&orgId=1))
* `git` ([overview](https://dashboards.gitlab.net/d/git-main/git-overview?orgId=1&from=now-1h&to=now), [containers](https://dashboards.gitlab.net/d/git-kube-containers/git3a-kube-containers-detail?from=now-1h&to=now&var-PROMETHEUS_DS=mimir-gitlab-gprd&var-environment=gprd&var-stage=cny&orgId=1))
* `sidekiq` ([overview](https://dashboards.gitlab.net/d/sidekiq-main/sidekiq-overview?orgId=1&from=now-1h&to=now), [containers](https://dashboards.gitlab.net/d/sidekiq-kube-containers/sidekiq3a-kube-containers-detail?from=now-1h&to=now&var-PROMETHEUS_DS=Global&var-environment=gprd&var-stage=main&orgId=1))
* Kibana - Puma (edit `json.type` to filter by service, `json.stage` for `cny` vs `main`)
* [Production 5xx responses](https://log.gprd.gitlab.net/goto/e0d9a290-b8c9-11ed-85ed-e7557b0a598c)
* [Staging 5xx responses](https://nonprod-log.gitlab.net/goto/040ba510-b8ca-11ed-9af2-6131f0ee4ce6)
* Kibana - Sidekiq (edit `json.shard` to switch between job types)
* [Failed production jobs](https://log.gprd.gitlab.net/goto/89320700-b813-11ed-9f43-e3784d7fe3ca)
* [Failed staging jobs](https://nonprod-log.gitlab.net/goto/e2744200-b814-11ed-9af2-6131f0ee4ce6)
* Sentry
* [Production overview](https://sentry.gitlab.net/gitlab/gitlabcom/dashboard/?statsPeriod=1h)
* [Staging overview](https://sentry.gitlab.net/gitlab/staginggitlabcom/dashboard/?statsPeriod=1h)
* QA runs can be observed via Slack:
* `#announcements` - Besides QA messages, multiple messages are sent to this channel to account for the different deployments.
* QA slack channels - There is a channel per environment, for example, a failure on gstg and gstg-cny will be posted in `#qa-staging`, a failure on gprd-cny and gprd will be posted in `#qa-production`, etc.
* Dealing with deploy failures: https://gitlab.com/gitlab-org/release/docs/-/blob/master/general/deploy/failures.md
issue