2026-01-08: Rollout Ruby 3.3 to Production
# Production Change
### Change Summary
Following up from the test rollout in staging environments, the plan is to rollout Ruby 3.3 to production.
During the change duration, auto-deploys will need to be paused.
### Change Details
1. **Services Impacted** - GitLab Rails, Sidekiq, and any other service that uses Ruby
2. **Change Technician** - `@jennykim-gitlab` `@dat.tang.gitlab`
3. **Change Reviewer** - `@skarbek`
4. **Scheduled Date and Time (UTC in format YYYY-MM-DD HH:MM)** - 2026-01-08 13:00
5. **Time tracking** - 480 minutes
6. **Downtime Component** - none
### Set Maintenance Mode in GitLab
No need for setting Maintenance mode.
## Preparation
> [!note]
>
> The following checklists must be done in advance, before setting the label ~"change::scheduled"
### Change Reviewer checklist
~8276990 ~8276981 ~8276978 ~8276976:
* [ ] Check if the following applies:
* The **scheduled day and time** of execution of the change is appropriate.
* The [change plan](#detailed-steps-for-the-change) is technically accurate.
* The change plan includes **estimated timing values** based on previous testing.
* The change plan includes a viable [rollback plan](#rollback).
* The specified [metrics/monitoring dashboards](#key-metrics-to-observe) provide sufficient visibility for the change.
~8276978 ~8276976:
* [x] Check if the following applies:
* The complexity of the plan is appropriate for the corresponding risk of the change. (i.e. the plan contains clear details).
* The change plan includes success measures for all steps/milestones during the execution.
* The change adequately minimizes risk within the environment/service.
* The performance implications of executing the change are well-understood and documented.
* The specified metrics/monitoring dashboards provide sufficient visibility for the change.
* If not, is it possible (or necessary) to make changes to observability platforms for added visibility?
* The change has a primary and secondary SRE with knowledge of the details available during the change window.
* The change window has been agreed with Release Managers in advance of the change. If the change is planned for APAC hours, this issue has an agreed pre-change approval.
* The labels ~"blocks deployments" and/or ~"blocks feature-flags" are applied as necessary.
### Change Technician checklist
* [x] The [Change Criticality](https://handbook.gitlab.com/handbook/engineering/infrastructure-platforms/change-management/#change-criticalities) has been set appropriately and requirements have been reviewed.
* [x] The [change plan](#detailed-steps-for-the-change) is technically accurate.
* [x] The [rollback plan](#rollback) is technically accurate and detailed enough to be executed by anyone with access.
* [x] This Change Issue is linked to the appropriate Issue and/or Epic
* [x] Change has been tested in staging and results noted in a comment on this issue.
* [x] A Test rollout to Canary was done https://gitlab.com/gitlab-com/gl-infra/production/-/issues/20918
* [ ] ~~A dry-run has been conducted and results noted in a comment on this issue.~~
* [x] The change execution window respects the [Production Change Lock periods](https://about.gitlab.com/handbook/engineering/infrastructure/change-management/#production-change-lock-pcl).
* [x] Once all boxes above are checked, mark the change request as scheduled: `/label ~"change::scheduled"`
* [x] For ~8276976 and ~8276978 change issues, the change event is added to the [GitLab Production](https://calendar.google.com/calendar/embed?src=gitlab.com_si2ach70eb1j65cnu040m3alq0%40group.calendar.google.com) calendar by the [change-scheduler bot](https://gitlab.com/gitlab-com/gl-infra/ops-team/toolkit/change-scheduler). It is schedule to run every 2 hours.
* [ ] For ~8276976 change issues, a Senior Infrastructure Manager has provided approval with the ~14866676 label on the issue.
* [x] For ~8276978 change issues, an Infrastructure Manager provided approval with the ~14866676 label on the issue.
* [x] For ~8276976 and ~8276978 changes, mention `@gitlab-org/saas-platforms/inframanagers` in this issue to provide visibility to all infrastructure managers.
* [x] For ~8276976, ~8276978, or ~"blocks deployments" change issues, confirm with Release managers that the change does not overlap or hinder any release process (In `#production` channel, mention `@release-managers` and this issue and await their acknowledgment.)
* [x] Mention that no PDM should be run on the day of the CR
## Detailed steps for the change
### Pre-execution steps
> [!note]
>
> The following steps should be done right at the scheduled time of the change request. The [preparation steps](#preparation) are listed below.
* [x] Make sure all tasks in [Change Technician checklist](#change-technician-checklist) are done
* [x] For ~8276976 and ~8276978 change issues, the SRE on-call has been informed prior to change being rolled out.
* [x] The SRE on-call provided approval with the ~25771657 label on the issue.
* [x] For ~8276976, ~8276978, or ~"blocks deployments" change issues, Release managers have been informed prior to change being rolled out. (In `#production` channel, mention `@release-managers` and this issue and await their acknowledgment.)
* [x] Mention that no PDM should be run on the day of the CR
* [x] There are currently no [active incidents](https://gitlab.com/gitlab-com/gl-infra/production/-/issues/?sort=created_date&state=opened&label_name%5B%5D=Incident%3A%3AActive&or%5Blabel_name%5D%5B%5D=severity%3A%3A1&or%5Blabel_name%5D%5B%5D=severity%3A%3A2&first_page_size=20) that are ~3760139 or ~3760140
* [ ] If the change involves doing maintenance on a database host, an appropriate silence targeting the host(s) should be added for the duration of the change.
### Change Steps - steps to take to execute the change
* [x] No PDM should be run on the day of the CR
* [x] Note the last deployment package that was successfully deployed to production in the [Rollback](#rollback) section.
* [x] Open aMR to update README with the new ruby version: https://gitlab.com/gitlab-org/gitlab/-/merge_requests/218123
_Estimated Time to Complete (mins)_ - 480 minutes (8 hours)
- [x] Set label ~"change::in-progress" `/label ~change::in-progress`
#### Pause auto deploys
- [x] Pause auto deploys: `/chatops run auto_deploy pause`
#### Prepare a Ruby 3.3 package
- [x] Inform the `@sre-on-call` and `@release-manager` in `#production`, and the engineers in the `#f_ruby3` channel in Slack that we are starting building a Ruby 3.3 package to deploy to production.
- [x] Set `USE_NEXT_RUBY_VERSION_IN_AUTODEPLOY` to `true` in the following projects:
- [x] Omnibus-gitlab: https://dev.gitlab.org/gitlab/omnibus-gitlab/-/settings/ci_cd
- [x] CNG: https://dev.gitlab.org/gitlab/charts/components/images/-/settings/ci_cd
- [x] Merge MR to update README with the new ruby version https://gitlab.com/gitlab-org/gitlab/-/merge_requests/218123
#### Start a deployment pipeline
- [x] Trigger a deployment pipeline by running the "MANUAL auto-deploy pick&tag" inactive manual scheduled pipeline: https://ops.gitlab.net/gitlab-org/release/tools/-/pipeline_schedules/
- [x] Make sure a new auto deploy pipeline has started.
* [x] https://dev.gitlab.org/gitlab/omnibus-gitlab/-/commits/18.8.202601081342+fb807874e7c.5c7075ed575
- [x] Verify that the Omnibus package contains Ruby 3.3.
- [x] Find out the image reference of the `Docker` job in the Omnibus packager pipeline. You can search for the string `Copying image` in the logs for that (under the collapsible section `docker-push-staging`)
```
─ docker run -it dev.gitlab.org:5005/gitlab/omnibus-gitlab/gitlab-ee:18.8.202601081342-fb807874e7c.5c7075ed575-arm64 ruby --version
ruby 3.3.10 (2025-10-23 revision 343ea05002) [aarch64-linux]
```
- [x] Verify that the CNG image contains Ruby 3.3.
* [x] Run the following command locally to know the bundled Ruby version - `docker run -it dev.gitlab.org:5005/gitlab/charts/components/images/gitlab-webservice-ee:<IMAGE_REFERENCE> -- ruby --version`
```
╰─ docker run -it dev.gitlab.org:5005/gitlab/charts/components/images/gitlab-webservice-ee:18-8-202601081342-fb807874e7c -- ruby --version
WARNING: image platform (linux/amd64) does not match the expected platform (linux/arm64)
Begin parsing .erb templates from /srv/gitlab/config
Begin parsing .tpl templates from /srv/gitlab/config
ruby 3.3.10 (2025-10-23 revision 343ea05002) [x86_64-linux]
```
#### Deploy the Ruby 3.3 Package
- [x] Start a new deployment pipeline for the above package with `/chatops run auto_deploy pipeline 18.8.202601081342-fb807874e7c.5c7075ed575`
* [x] https://ops.gitlab.net/gitlab-org/release/tools/-/pipelines/5368534
- [x] Ping monitoring engineers and `@release-managers` on `#f_ruby3` channel in Slack when package is deployed to `gstg-cny`.
- [ ] Check [monitoring](#monitoring)
- [ ] Make sure Quality smoke and reliable pipelines on gstg-cny have passed. If there are failures, ask the Quality on-call to have a look to determine if the failures are related to the Ruby 3.3 rollout.
- [ ] Let deployment continue upto `gprd-cny`.
- [ ] Ping monitoring engineers and `@release-managers` on `#f_ruby3` channel in Slack when package is deployed to `gprd-cny`.
- [ ] Check [monitoring](#monitoring)
- [ ] Make sure Quality smoke and reliable pipelines on gprd-cny have passed. If there are failures, ask the Quality on-call to have a look to determine if the failures are related to the Ruby 3.3 rollout.
- [ ] Promote the package to `gstg` once the monitoring engineers give green light.
- [ ] Keep an eye for any `gprd` deployment jobs and cancel them.
- [ ] Ping monitoring engineers and `@release-managers` on `#f_ruby3` channel in Slack when package is deployed to `gstg`.
- [ ] Check [monitoring](#monitoring)
##### Deploy to Production
We will manually deploy to the zonal clusters manually, then the regional cluster. Bake to account for monitoring time in-between each cluster.
- [ ] Set `MANUAL_GPRD_DEPLOY` to `true` in https://ops.gitlab.net/gitlab-com/gl-infra/k8s-workloads/gitlab-com/-/settings/ci_cd
- [ ] Cancel `notify_success:gprd` job to not accidentally announce a successful deploy during baking time of manual jobs
- [ ] Restart the previously cancelled `gprd` deployment job(s) to start deployment to production
###### Zonal cluster b
- [ ] Manually run `gprd-us-east1-b:auto-deploy`
- [ ] Ping monitoring engineers and `@release-managers` on `#f_ruby3` channel in Slack when package is deployed to the zonal cluster `gprd-us-east1-b`.
- [ ] Check [monitoring](#monitoring). Remember to set the zone to `b`
- [ ] Bake for 30 minutes, or until engineers give green light.
###### Zonal cluster c
- [ ] Manually run `gprd-us-east1-c:auto-deploy`
- [ ] Ping monitoring engineers and `@release-managers` on `#f_ruby3` channel in Slack when package is deployed to the zonal cluster `gprd-us-east1-c`.
- [ ] Check [monitoring](#monitoring). Remember to set the zone to `c`
- [ ] Bake for 30 minutes, or until engineers give green light.
###### Zonal cluster d
- [ ] Manually run `gprd-us-east1-d:auto-deploy`
- [ ] Ping monitoring engineers `#f_ruby3` channel in Slack when package is deployed to the zonal cluster `gprd-us-east1-d`.
- [ ] Check [monitoring](#monitoring). Remember to set the zone to `d`
- [ ] Bake for 15 minutes, or until engineers give green light.
###### Regional cluster
- [ ] Manually run `gprd:auto-deploy`
- [ ] Inform the `@sre-on-call` and `@release-managers` in `#production` on Slack and the monitoring engineers in the `#f_ruby3` channel in Slack when the package is deployed to `gprd`.
- [ ] Check [monitoring](#monitoring)
###### Post-deploy
- [ ] Restart the previously cancelled `notify_success:gprd` job to notify the successful deployment to `gprd` on Slack `#announcements` channel
- [ ] Remove `MANUAL_GPRD_DEPLOY` in https://ops.gitlab.net/gitlab-com/gl-infra/k8s-workloads/gitlab-com/-/settings/ci_cd
- [ ] Do not execute post-deploy migrations for the rest of the day (EMEA and AMER) beyond this point.
#### Unpause auto deploys
- [ ] `/chatops run auto_deploy unpause`
- [ ] Set label ~"change::complete" `/label ~change::complete`
## Rollback
### Last Ruby 3.2 package that was successfully deployed to production
18.8.202601080636-1333cdaf7f7.aa9ef7f5e4d
### Rollback steps - steps to be taken in the event of a need to rollback this change
_Estimated Time to Complete (mins)_ - 60 minutes
#### Rollback production-canary only
If you have not promoted to production and need to rollback production-canary, follow the following steps:
* [ ] Notify `@sre-on-call`, `@release-managers` in `#production` that Production Canary will be drained.
* [ ] `/chatops run canary --disable --production`
* [ ] Follow the steps in [Make sure that the next auto deploy package will be built with Ruby 3.2](#make-sure-that-the-next-auto-deploy-package-will-be-built-with-ruby-32)
#### Rollback production and staging
If we need to rollback production and staging, follow the steps in https://gitlab.com/gitlab-org/release/docs/-/blob/master/runbooks/rollback-a-deployment.md to rollback to a Ruby 3.2 package. The steps are reproduced here as well:
* [ ] `/chatops run rollback check gprd`
* [ ] Notify `@sre-on-call`, `@release-managers` in `#production` that a rollback is about to be started. Make sure they know that Canary will also be drained.
* [ ] `/chatops run canary --disable --production`
* [ ] `/chatops run deploy <PACKAGE NAME> gprd --rollback`
* [ ] `/chatops run rollback check gstg`
* [ ] Notify `@sre-on-call`, `@release-managers` in `#staging` that a rollback is about to be started. Make sure they know that Canary will also be drained.
* [ ] `/chatops run canary --disable --staging`
* [ ] `/chatops run deploy <PACKAGE NAME> gstg --rollback`
#### Make sure that the next auto deploy package will be built with Ruby 3.2
* [x] Set `USE_NEXT_RUBY_VERSION_IN_AUTODEPLOY` to `false` in https://dev.gitlab.org/gitlab/omnibus-gitlab/-/settings/ci_cd.
* [x] Set `USE_NEXT_RUBY_VERSION_IN_AUTODEPLOY` to `false` in https://dev.gitlab.org/gitlab/charts/components/images/-/settings/ci_cd.
* [x] If you had already unpaused auto-deploys, cancel any auto-deploy pipelines whose packages were built before you changed the `USE_NEXT_RUBY_VERSION_IN_AUTODEPLOY` variable to `false`.
* [x] Revert MR to update README; https://gitlab.com/gitlab-org/gitlab/-/merge_requests/218205
* [x] Set label ~"change::aborted" `/label ~change::aborted`
## Monitoring
### Key metrics to observe
* Dashboards/metrics:
* Monitor the following dashboards for unhealthy dip in service health for the environment/cluster that is being rolled out.
* [Deployment health](https://dashboards.gitlab.net/d/delivery-deployment_health/delivery-deployment-health?orgId=1), configurable with environment, stage, and type/service
* [Kubernetes compute resource/cluster health](https://dashboards.gitlab.net/d/kubernetes-resources-cluster/kubernetes-compute-resources-cluster?orgId=1&refresh=5m), configurable with clusters
* [Kubernetes compute resource/pods health](https://dashboards.gitlab.net/d/kubernetes-resources-namespace/kubernetes-compute-resources-namespace-pods?orgId=1&refresh=5m), configurable with clusters and namespace
* [Kubernetes networking](https://dashboards.gitlab.net/d/kubernetes-cluster-total/kubernetes-networking-cluster?orgId=1&refresh=5m), configurable with clusters
* Per-service dashboards (change `env` and `stage` to toggle between `gstg`/`gprd` and `main`/`cny`):
* `api` ([overview](https://dashboards.gitlab.net/d/api-main/api-overview?orgId=1&from=now-1h&to=now), [containers](https://dashboards.gitlab.net/d/api-kube-containers/api3a-kube-containers-detail?from=now-1h&to=now&var-PROMETHEUS_DS=mimir-gitlab-gprd&var-environment=gprd&var-stage=cny&orgId=1))
* `web` ([overview](https://dashboards.gitlab.net/d/web-main/web-overview?orgId=1&from=now-1h&to=now), [containers](https://dashboards.gitlab.net/d/web-kube-containers/web3a-kube-containers-detail?from=now-1h&to=now&var-PROMETHEUS_DS=mimir-gitlab-gprd&var-environment=gprd&var-stage=cny&orgId=1))
* `websockets` ([overview](https://dashboards.gitlab.net/d/websockets-main/websockets-overview?orgId=1&from=now-1h&to=now), [containers](https://dashboards.gitlab.net/d/websockets-kube-containers/websockets3a-kube-containers-detail?from=now-1h&to=now&var-PROMETHEUS_DS=mimir-gitlab-gprd&var-environment=gprd&var-stage=cny&orgId=1))
* `git` ([overview](https://dashboards.gitlab.net/d/git-main/git-overview?orgId=1&from=now-1h&to=now), [containers](https://dashboards.gitlab.net/d/git-kube-containers/git3a-kube-containers-detail?from=now-1h&to=now&var-PROMETHEUS_DS=mimir-gitlab-gprd&var-environment=gprd&var-stage=cny&orgId=1))
* `sidekiq` ([overview](https://dashboards.gitlab.net/d/sidekiq-main/sidekiq-overview?orgId=1&from=now-1h&to=now), [containers](https://dashboards.gitlab.net/d/sidekiq-kube-containers/sidekiq3a-kube-containers-detail?from=now-1h&to=now&var-PROMETHEUS_DS=Global&var-environment=gprd&var-stage=main&orgId=1))
* Kibana - Puma (edit `json.type` to filter by service, `json.stage` for `cny` vs `main`)
* [Production 5xx responses](https://log.gprd.gitlab.net/goto/e0d9a290-b8c9-11ed-85ed-e7557b0a598c)
* [Staging 5xx responses](https://nonprod-log.gitlab.net/goto/040ba510-b8ca-11ed-9af2-6131f0ee4ce6)
* Kibana - Sidekiq (edit `json.shard` to switch between job types)
* [Failed production jobs](https://log.gprd.gitlab.net/goto/89320700-b813-11ed-9f43-e3784d7fe3ca)
* [Failed staging jobs](https://nonprod-log.gitlab.net/goto/e2744200-b814-11ed-9af2-6131f0ee4ce6)
* Sentry
* [Production overview](https://sentry.gitlab.net/gitlab/gitlabcom/dashboard/?statsPeriod=1h)
* [Staging overview](https://sentry.gitlab.net/gitlab/staginggitlabcom/dashboard/?statsPeriod=1h)
* QA runs can be observed via Slack:
* `#announcements` - Besides QA messages, multiple messages are sent to this channel to account for the different deployments.
* QA slack channels - There is a channel per environment, for example, a failure on gstg and gstg-cny will be posted in `#qa-staging`, a failure on gprd-cny and gprd will be posted in `#qa-production`, etc.
* Dealing with deploy failures: https://gitlab.com/gitlab-org/release/docs/-/blob/master/general/deploy/failures.md
issue