[GPRD] Increase pgbouncer pool sizes to reduce saturation and connection wait times
# Production Change
### Change Summary
Equivalent `GSTG` change at https://gitlab.com/gitlab-com/gl-infra/production/-/issues/7378
This Change is to increase the pgbouncer pool sizes, as discussed at https://gitlab.com/gitlab-com/gl-infra/capacity-planning/-/issues/32
During the [CI Decomposition phase 4](https://gitlab.com/groups/gitlab-org/-/epics/6160#phase-4-separate-write-connections-for-ci-and-main-still-going-to-the-same-primary-host) we have split both sync and async pools between `main` and `ci` pgbouncers, to avoid saturation of the `patroni-main` writer node. During this period of split pools we faced [some incidents due to sidekiq pool saturation](https://gitlab.com/gitlab-com/gl-infra/production/-/issues/6880), where we increased the pools to reduce the application impact, but we were still [limited by the Writer node resource capacity](https://gitlab.com/gitlab-com/gl-infra/reliability/-/issues/15628#note_1000164201), hence we kept tracking saturation risk for both pools sync and async pools at https://gitlab.com/gitlab-com/gl-infra/capacity-planning/-/issues/32
Now having finished the [Phase 7: Promotion of the CI database](https://gitlab.com/groups/gitlab-org/-/epics/7791), we have 2 writer nodes, 1 for `patroni-main` and 1 for `patroni-ci`, therefore we plan to roll out this CR to increase pgbouncer pool sizes gradually, to reduce saturation of pgbouncers but we also aim to let a head room of resources in the Patroni Writer nodes for unexpected spikes.
**The [Current pool sizes](https://thanos-query.ops.gitlab.net/graph?g0.expr=min(pgbouncer_databases_pool_size%7Bname%3D~%22gitlabhq_production.*%22%2C%20env%3D%22gprd%22%2Cstage%3D%22main%22%2Ctype%3D~%22pgbouncer.*%22%7D)%20by%20(name%2Ctype)&g0.tab=1&g0.stacked=0&g0.range_input=2d&g0.max_source_resolution=0s&g0.deduplicate=1&g0.partial_response=0&g0.store_matches=%5B%5D) are:**
- {name="gitlabhq_production", type="pgbouncer"} = 27
- {name="gitlabhq_production", type="pgbouncer-ci"} = 27
- {name="gitlabhq_production_sidekiq", type="pgbouncer"} = 30
- {name="gitlabhq_production_sidekiq", type="pgbouncer-ci"} = 22
**The target of the pool resizing is to find satisfactory values to:**
- Connection Saturation per Pool < 80% - Metric at: https://gitlab.com/gitlab-com/gl-infra/production/-/issues/7381#evaluate-next-pool-increase
- Total Connection Wait Time < 10 seconds - Metric at: https://gitlab.com/gitlab-com/gl-infra/production/-/issues/7381#evaluate-next-pool-increase
**Different satistactory values can be agreed at every iteraction/round as per https://gitlab.com/gitlab-com/gl-infra/capacity-planning/-/issues/32#note_1018539403**
### Change Details
1. **Services Impacted** - ~"Service::Pgbouncer" ~"Service::API" ~"Service::Web" ~"Service::Postgres" ~Database
1. **Change Technician** - @rhenchen.gitlab
1. **Change Reviewer** - @ayufan @DylanGriffith @Finotto
1. **Time tracking** - Multiple Weeks - following agreement at https://gitlab.com/gitlab-com/gl-infra/capacity-planning/-/issues/32
1. **Downtime Component** - None
## Detailed steps for the change
### Pre-Change Steps - steps to be completed before execution of the change
*Estimated Time to Complete (mins)* - 10 minutes
1. [x] Confirm that `gstg` CR was executed sucessfully - https://gitlab.com/gitlab-com/gl-infra/production/-/issues/7378
1. [x] Check that all MRs are rebased
1. [x] Confirm which host is the Patroni Writer for both clusters
- Main Cluster Primary Host:
- CI Cluster Primary Host:
1. [x] \[optional\] clone https://gitlab.com/rhenchen.gitlab/rhenchen/-/tree/main/scripts and get familiar with the `ssh_cluster_regex.sh` script
### Change Steps - steps to take to execute the change
*Estimated Time to Complete (mins)* - 15 minutes (each round)
1. [x] **1st Round (11/07/2022)**
1. [x] During quiet working hours (\~00:00 UTC)
1. [x] Get green light from `@sre-oncall` and `@release-managers` at `#production` Slack channel
1. [x] Set label ~"change::in-progress" `/label ~change::in-progress`
1. [x] Increase pool size limit of Sidekiq PGBouncer pools by 50% -> `Main = 45` and `CI = 33`, as decided at https://gitlab.com/gitlab-com/gl-infra/capacity-planning/-/issues/32#note_1017357561
- MR: https://gitlab.com/gitlab-com/gl-infra/chef-repo/-/merge_requests/2101
1. [x] Re-run chef in all PGBouncer nodes
- Execute: `ssh_cluster_regex.sh "(pgbouncer-0|pgbouncer-ci|pgbouncer-sidekiq).*gprd" "sudo chef-client"`
1. [x] Confirm the pool sizing in all PGBouncer nodes
- Execute: `ssh_cluster_regex.sh "(pgbouncer-0|pgbouncer-ci|pgbouncer-sidekiq).*gprd" "sudo pgb-console -c \"SHOW DATABASES;\""` (check `pool_size`)
1. [x] Set label change complete ~"change::complete" `/label ~change::complete `
1. [x] Monitor the [key metrics](https://gitlab.com/gitlab-com/gl-infra/production/-/issues/7381#monitoring) for 1 week;
1. [x] Discuss necessity of further increase, decrease or rollback to previous stage ;
1. [x] **2nd Round (27/07/2022)**
1. [x] During quiet working hours (after \~22:00 UTC)
1. [x] Get green light from `@sre-oncall` and `@release-managers` at `#production` Slack channel
1. [x] #7519+
1. [x] Increase pool size limit of SYNC (non-sidekiq) pools by 50% -> `Main = 40` and `CI = 40`, as decided at {+ URL +}
- MR: https://gitlab.com/gitlab-com/gl-infra/chef-repo/-/merge_requests/2108
1. [x] Re-run chef in all PGBouncer nodes
- Execute: `ssh_cluster_regex.sh "(pgbouncer-0|pgbouncer-ci|pgbouncer-sidekiq).*gprd" "sudo chef-client"`
1. [x] Confirm the pool sizing in all PGBouncer nodes
- Execute: `ssh_cluster_regex.sh "(pgbouncer-0|pgbouncer-ci|pgbouncer-sidekiq).*gprd" "sudo pgb-console -c \"SHOW DATABASES;\""` (check `pool_size`)
1. [x] Set label ~"change::complete" `/label ~change::complete`
1. [x] Monitor the [key metrics](https://gitlab.com/gitlab-com/gl-infra/production/-/issues/7381#monitoring) for 1 week;
1. [x] Discuss necessity of further increase, decrease or rollback to previous stage ;
1. [x] **3nd Round (not necessary)**
## Rollback
### Rollback steps - steps to be taken in the event of a need to rollback this change
*Estimated Time to Complete (mins)* - 15 minutes
1. [ ] Revert MR of the LAST applied round
- MR: 3rd round - {+ TODO +}
- MR: 2nd round - https://gitlab.com/gitlab-com/gl-infra/chef-repo/-/merge_requests/2108
- MR: 1st round - https://gitlab.com/gitlab-com/gl-infra/chef-repo/-/merge_requests/2101
1. [ ] Re-run chef in all PGBouncer nodes
- Execute: `ssh_cluster_regex.sh "(pgbouncer-0|pgbouncer-ci|pgbouncer-sidekiq).*gprd" "sudo chef-client"`
1. [ ] Confirm the pool sizing in all PGBouncer nodes
- Execute: `ssh_cluster_regex.sh "(pgbouncer-0|pgbouncer-ci|pgbouncer-sidekiq).*gprd" "sudo pgb-console -c \"SHOW DATABASES;\""` (check `pool_size`)
1. [ ] Set label ~"change::aborted" `/label ~change::aborted`
## Monitoring
### Key metrics to observe
<!--
* Describe which dashboards and which specific metrics we should be monitoring related to this change using the format below.
-->
#### Rollback Thresholds
- Metric: Leader nodes CPU Load (processes per core)
- Location: [node_load1](https://thanos-query.ops.gitlab.net/graph?g0.expr=avg_over_time(node_load1%7Benv%3D%22gprd%22%2Ctype%3D~%22patroni%7Cpatroni-ci%22%7D%5B10m%5D)%20%2F%20instance%3Anode_cpus%3Acount%20and%20on%20(fqdn)%20pg_replication_is_replica%3D%3D0&g0.tab=0&g0.stacked=0&g0.range_input=1d&g0.max_source_resolution=0s&g0.deduplicate=1&g0.partial_response=0&g0.store_matches=%5B%5D)
- What changes to this metric should prompt a rollback: `CPU Load Avg > 0.7` (per core) for 15 minutes or more;
- Metric: Leader nodes CPU Usage (% of all CPUs)
- Location: [node_cpu_utilization](https://thanos-query.ops.gitlab.net/graph?g0.expr=avg_over_time(instance%3Anode_cpu_utilization%3Aratio%7Benv%3D%22gprd%22%2Ctype%3D~%22patroni%7Cpatroni-ci%22%7D%5B10m%5D)%20and%20on%20(fqdn)%20pg_replication_is_replica%3D%3D0&g0.tab=0&g0.stacked=0&g0.range_input=1d&g0.max_source_resolution=0s&g0.deduplicate=1&g0.partial_response=0&g0.store_matches=%5B%5D)
- What changes to this metric should prompt a rollback: avg `CPU utilization > 70%` for 15 minutes or more;
- Metric: Leader nodes Memory Trashing (Swap in/out)
- Location: [node_vmstat_pswpin](https://thanos-query.ops.gitlab.net/graph?g0.expr=(rate(node_vmstat_pswpin%7Benv%3D%22gprd%22%2Ctype%3D~%22patroni%7Cpatroni-ci%22%7D%5B10m%5D)%20*%204096)%20%0Aand%20on%20(fqdn)%20pg_replication_is_replica%3D%3D0&g0.tab=0&g0.stacked=0&g0.range_input=1d&g0.max_source_resolution=0s&g0.deduplicate=1&g0.partial_response=0&g0.store_matches=%5B%5D) , [node_vmstat_pswpout](https://thanos-query.ops.gitlab.net/graph?g0.expr=rate(node_vmstat_pswpout%7Benv%3D%22gprd%22%2Ctype%3D~%22patroni%7Cpatroni-ci%22%7D%5B10m%5D)%20*%204096%0Aand%20on%20(fqdn)%20pg_replication_is_replica%3D%3D0&g0.tab=0&g0.stacked=0&g0.range_input=1d&g0.max_source_resolution=0s&g0.deduplicate=1&g0.partial_response=0&g0.store_matches=%5B%5D)
- What changes to this metric should prompt a rollback: Spikes of `Swapping activity > 0` for 5 minutes or more;
- Metric: Leader nodes I/O wait
- Location: [node_disk_read_time_seconds_total](https://thanos-query.ops.gitlab.net/graph?g0.expr=rate(node_disk_read_time_seconds_total%7Benv%3D%22gprd%22%2Ctype%3D~%22patroni%7Cpatroni-ci%22%2C%20device%3D%22sdb%22%7D%5B1m%5D)%20%2F%20rate(node_disk_reads_completed_total%7Benv%3D%22gprd%22%2Ctype%3D~%22patroni%7Cpatroni-ci%22%2C%20device!~%22dm.*%22%7D%5B1m%5D)%20%3E%200%0Aand%20on%20(fqdn)%20pg_replication_is_replica%3D%3D0&g0.tab=0&g0.stacked=0&g0.range_input=1d&g0.max_source_resolution=0s&g0.deduplicate=1&g0.partial_response=0&g0.store_matches=%5B%5D) , [node_disk_write_time_seconds_total](https://thanos-query.ops.gitlab.net/graph?g0.expr=rate(node_disk_write_time_seconds_total%7Benv%3D%22gprd%22%2Ctype%3D~%22patroni%7Cpatroni-ci%22%2C%20device%3D%22sdb%22%7D%5B1m%5D)%20%2F%20rate(node_disk_writes_completed_total%7Benv%3D%22gprd%22%2Ctype%3D~%22patroni%7Cpatroni-ci%22%2C%20device!~%22dm.*%22%7D%5B1m%5D)%20%3E%200%0Aand%20on%20(fqdn)%20pg_replication_is_replica%3D%3D0&g0.tab=0&g0.stacked=0&g0.range_input=1d&g0.max_source_resolution=0s&g0.deduplicate=1&g0.partial_response=0&g0.store_matches=%5B%5D)
- What changes to this metric should prompt a rollback: avg `I/O wait > 10ms (or 0.01s)` for 2 minutes or more, _but only if caused by an intense I/O activity_;
- Metric: Leader nodes I/O Throughput in MB/s
- Location: [/dev/sdb node_disk_read_bytes_total](https://thanos-query.ops.gitlab.net/graph?g0.expr=rate(node_disk_read_bytes_total%7Benv%3D%22gprd%22%2Ctype%3D~%22patroni%7Cpatroni-ci%22%2C%20device%3D%22sdb%22%7D%5B1m%5D)%20and%20on%20(fqdn)%20pg_replication_is_replica%3D%3D0&g0.tab=0&g0.stacked=0&g0.range_input=1d&g0.max_source_resolution=0s&g0.deduplicate=1&g0.partial_response=0&g0.store_matches=%5B%5D), [/dev/sdb node_disk_written_bytes_total](https://thanos-query.ops.gitlab.net/graph?g0.expr=rate(node_disk_written_bytes_total%7Benv%3D%22gprd%22%2Ctype%3D~%22patroni%7Cpatroni-ci%22%2C%20device%3D%22sdb%22%7D%5B1m%5D)%20and%20on%20(fqdn)%20pg_replication_is_replica%3D%3D0&g0.tab=0&g0.stacked=0&g0.range_input=1d&g0.max_source_resolution=0s&g0.deduplicate=1&g0.partial_response=0&g0.store_matches=%5B%5D)
- What changes to this metric should prompt a rollback: `I/O Throughput > 840 MB/s`, 70% of the limit 1,200 MB/s*, for 15 minutes or more;
- Metric: Leader nodes IOPS
- Location: [/dev/sdb node_disk_reads_completed_total](https://thanos-query.ops.gitlab.net/graph?g0.expr=rate(node_disk_reads_completed_total%7Benv%3D%22gprd%22%2Ctype%3D~%22patroni%7Cpatroni-ci%22%2C%20device%3D%22sdb%22%7D%5B1m%5D)%20and%20on%20(fqdn)%20pg_replication_is_replica%3D%3D0&g0.tab=0&g0.stacked=0&g0.range_input=1d&g0.max_source_resolution=0s&g0.deduplicate=1&g0.partial_response=0&g0.store_matches=%5B%5D) , [/dev/sdb node_disk_writes_completed_total](https://thanos-query.ops.gitlab.net/graph?g0.expr=rate(node_disk_writes_completed_total%7Benv%3D%22gprd%22%2Ctype%3D~%22patroni%7Cpatroni-ci%22%2C%20device%3D%22sdb%22%7D%5B1m%5D)%20and%20on%20(fqdn)%20pg_replication_is_replica%3D%3D0&g0.tab=0&g0.stacked=0&g0.range_input=1d&g0.max_source_resolution=0s&g0.deduplicate=1&g0.partial_response=0&g0.store_matches=%5B%5D)
- What changes to this metric should prompt a rollback: I/O operations per second `IOPS > 70,000`, 70% of the limit of 100,000 iops*, for 15 minutes or more;
- Metric: Writer nodes Network throughput
- Location: [node_network_receive_bytes_total](https://thanos-query.ops.gitlab.net/graph?g0.expr=rate(node_network_receive_bytes_total%7Benv%3D%22gprd%22%2Ctype%3D~%22patroni%7Cpatroni-ci%22%2C%20device!%3D%22lo%22%7D%5B1m%5D)%20and%20on%20(fqdn)%20pg_replication_is_replica%3D%3D0&g0.tab=0&g0.stacked=0&g0.range_input=1d&g0.max_source_resolution=0s&g0.deduplicate=1&g0.partial_response=0&g0.store_matches=%5B%5D) , [node_network_transmit_bytes_total](https://thanos-query.ops.gitlab.net/graph?g0.expr=rate(node_network_transmit_bytes_total%7Benv%3D%22gprd%22%2Ctype%3D~%22patroni%7Cpatroni-ci%22%2C%20device!%3D%22lo%22%7D%5B1m%5D)%20and%20on%20(fqdn)%20pg_replication_is_replica%3D%3D0&g0.tab=0&g0.stacked=0&g0.range_input=1d&g0.max_source_resolution=0s&g0.deduplicate=1&g0.partial_response=0&g0.store_matches=%5B%5D)
- What changes to this metric should prompt a rollback: Sustained `Network Throughput > 22.7 Gbps (2.8 GB/s)`, 70% the [VM limit](https://cloud.google.com/compute/docs/general-purpose-machines#n1_machines) of `32 Gbps (4 GB/s)`*, for 15 minutes or more;
_* Network and Storage I/O performance limits in `gprd` are based on `SSD (performance) persistent disk` of 28 TBs and `n1-highmem-96` VM with 96 vCPUs, where the I/O bottleneck is the 96vCPU [N1 machine type limits for pd-performance](https://cloud.google.com/compute/docs/disks/performance#machine-type-disk-limits) and not the [block device limits](https://cloud.google.com/compute/docs/disks/performance#type_comparison)_
#### Evaluate next pool increase
- Metric: Connection Saturation per Pool
- Location: [Main cluster](https://thanos.gitlab.net/graph?g0.expr=clamp_min(clamp_max(sum%20by%20(database%2C%20env%2C%20environment%2C%20shard%2C%20stage%2C%20type)%20(%0A%20%20pgbouncer_pools_server_active_connections%7Btype%3D%22pgbouncer%22%2C%20environment%3D%22gprd%22%2C%20user%3D%22gitlab%22%2C%20database!%3D%22pgbouncer%22%7D%20%2B%0A%20%20pgbouncer_pools_server_testing_connections%7Btype%3D%22pgbouncer%22%2C%20environment%3D%22gprd%22%2C%20user%3D%22gitlab%22%2C%20database!%3D%22pgbouncer%22%7D%20%2B%0A%20%20pgbouncer_pools_server_used_connections%7Btype%3D%22pgbouncer%22%2C%20environment%3D%22gprd%22%2C%20user%3D%22gitlab%22%2C%20database!%3D%22pgbouncer%22%7D%20%2B%0A%20%20pgbouncer_pools_server_login_connections%7Btype%3D%22pgbouncer%22%2C%20environment%3D%22gprd%22%2C%20user%3D%22gitlab%22%2C%20database!%3D%22pgbouncer%22%7D%0A)%0A%2F%0Asum%20by%20(database%2C%20env%2C%20environment%2C%20shard%2C%20stage%2C%20type)%20(%0A%20%20label_replace(%0A%20%20%20%20pgbouncer_databases_pool_size%7Btype%3D%22pgbouncer%22%2C%20environment%3D%22gprd%22%7D%2C%0A%20%20%20%20%22database%22%2C%20%22gitlabhq_production_sidekiq%22%2C%20%22name%22%2C%20%22gitlabhq_production_sidekiq%22%0A%20%20)%0A)%0A%2C1)%2C0)&g0.tab=0&g0.stacked=0&g0.range_input=1d&g0.max_source_resolution=0s&g0.deduplicate=1&g0.partial_response=0&g0.store_matches=%5B%5D) , [CI cluster](https://thanos.gitlab.net/graph?g0.expr=clamp_min(clamp_max(sum%20by%20(database%2C%20env%2C%20environment%2C%20shard%2C%20stage%2C%20type)%20(%0A%20%20pgbouncer_pools_server_active_connections%7Btype%3D%22pgbouncer-ci%22%2C%20environment%3D%22gprd%22%2C%20user%3D%22gitlab%22%2C%20database!%3D%22pgbouncer%22%7D%20%2B%0A%20%20pgbouncer_pools_server_testing_connections%7Btype%3D%22pgbouncer-ci%22%2C%20environment%3D%22gprd%22%2C%20user%3D%22gitlab%22%2C%20database!%3D%22pgbouncer%22%7D%20%2B%0A%20%20pgbouncer_pools_server_used_connections%7Btype%3D%22pgbouncer-ci%22%2C%20environment%3D%22gprd%22%2C%20user%3D%22gitlab%22%2C%20database!%3D%22pgbouncer%22%7D%20%2B%0A%20%20pgbouncer_pools_server_login_connections%7Btype%3D%22pgbouncer-ci%22%2C%20environment%3D%22gprd%22%2C%20user%3D%22gitlab%22%2C%20database!%3D%22pgbouncer%22%7D%0A)%0A%2F%0Asum%20by%20(database%2C%20env%2C%20environment%2C%20shard%2C%20stage%2C%20type)%20(%0A%20%20label_replace(%0A%20%20%20%20pgbouncer_databases_pool_size%7Btype%3D%22pgbouncer-ci%22%2C%20environment%3D%22gprd%22%7D%2C%0A%20%20%20%20%22database%22%2C%20%22gitlabhq_production_sidekiq%22%2C%20%22name%22%2C%20%22gitlabhq_production_sidekiq%22%0A%20%20)%0A)%0A%2C1)%2C0)&g0.tab=0&g0.stacked=0&g0.range_input=1d&g0.max_source_resolution=0s&g0.deduplicate=1&g0.partial_response=0&g0.store_matches=%5B%5D)
- What changes to this metric should prompt a further increase of the pool size: spikes of `pool saturation > 80% (0.8)` for more than 10 minutes;
- Metric: Total Connection Wait Time
- Location: [Main cluster](https://thanos.gitlab.net/graph?g0.expr=sum%20by%20(database%2C%20environment%2C%20type)%20(rate(pgbouncer_stats_client_wait_seconds_total%7Btype%3D%22pgbouncer%22%2C%20environment%3D%22gprd%22%2C%20database!%3D%22pgbouncer%22%7D%5B1m%5D)%20%2F%20on()%20group_left()%20(vector((time()%20%3C%20bool%201588233600)%20*%201000000)%20%3D%3D%201000000%20or%20vector(1)))&g0.tab=0&g0.stacked=0&g0.range_input=1d&g0.max_source_resolution=0s&g0.deduplicate=1&g0.partial_response=0&g0.store_matches=%5B%5D) , [CI cluster](https://thanos.gitlab.net/graph?g0.expr=sum%20by%20(database%2C%20environment%2C%20type)%20(rate(pgbouncer_stats_client_wait_seconds_total%7Btype%3D%22pgbouncer-ci%22%2C%20environment%3D%22gprd%22%2C%20database!%3D%22pgbouncer%22%7D%5B1m%5D)%20%2F%20on()%20group_left()%20(vector((time()%20%3C%20bool%201588233600)%20*%201000000)%20%3D%3D%201000000%20or%20vector(1)))&g0.tab=0&g0.stacked=0&g0.range_input=1d&g0.max_source_resolution=0s&g0.deduplicate=1&g0.partial_response=0&g0.store_matches=%5B%5D)
- What changes to this metric should prompt a further increase of the pool size: spikes of `connection wait time > 10 seconds` at any moment;
## Change Reviewer checklist
<!--
To be filled out by the reviewer.
-->
~C4 ~C3 ~C2 ~C1:
- [x] Check if the following applies:
- The **scheduled day and time** of execution of the change is appropriate.
- The [change plan](#detailed-steps-for-the-change) is technically accurate.
- The change plan includes **estimated timing values** based on previous testing.
- The change plan includes a viable [rollback plan](#rollback).
- The specified [metrics/monitoring dashboards](#key-metrics-to-observe) provide sufficient visibility for the change.
~C2 ~C1:
- [x] Check if the following applies:
- The complexity of the plan is appropriate for the corresponding risk of the change. (i.e. the plan contains clear details).
- The change plan includes success measures for all steps/milestones during the execution.
- The change adequately minimizes risk within the environment/service.
- The performance implications of executing the change are well-understood and documented.
- The specified metrics/monitoring dashboards provide sufficient visibility for the change.
- If not, is it possible (or necessary) to make changes to observability platforms for added visibility?
- The change has a primary and secondary SRE with knowledge of the details available during the change window.
- The labels ~"blocks deployments" and/or ~"blocks feature-flags" are applied as necessary
## Change Technician checklist
<!--
To find out who is on-call, in #production channel run: /chatops run oncall production.
-->
- [x] Check if all items below are complete:
- The [change plan](#detailed-steps-for-the-change) is technically accurate.
- This Change Issue is linked to the appropriate Issue and/or Epic
- Change has been tested in staging and results noted in a comment on this issue.
- A dry-run has been conducted and results noted in a comment on this issue.
- For ~C1 and ~C2 change issues, the change event is added to the [GitLab Production](https://calendar.google.com/calendar/embed?src=gitlab.com_si2ach70eb1j65cnu040m3alq0%40group.calendar.google.com) calendar.
- For ~C1 and ~C2 change issues, the SRE on-call has been informed prior to change being rolled out. (In #production channel, mention `@sre-oncall` and this issue and await their acknowledgement.)
- Release managers have been informed (If needed! Cases include DB change) prior to change being rolled out. (In #production channel, mention `@release-managers` and this issue and await their acknowledgment.)
- There are currently no [active incidents](https://gitlab.com/gitlab-com/gl-infra/production/-/issues?scope=all&utf8=%E2%9C%93&state=opened&label_name[]=Incident%3A%3AActive) that are ~severity::1 or ~severity::2
- If the change involves doing maintenance on a database host, an appropriate silence targeting the host(s) should be added for the duration of the change.
issue