2023-04-17: [GSTG][main db] Provision a new PG14 cluster
<!--
Please review https://about.gitlab.com/handbook/engineering/infrastructure/change-management/ for the most recent information on our change plans and execution policies.
-->
# Production Change
### Change Summary
Provisions a new cluster via TF, previously attempted via https://ops.gitlab.net/gitlab-com/gl-infra/config-mgmt/-/merge_requests/5439. Which was reverted via https://ops.gitlab.net/gitlab-com/gl-infra/config-mgmt/-/merge_requests/5453 due to not following the Change Management process.
Reference: https://gitlab.com/gitlab-com/gl-infra/reliability/-/issues/16419.
### Change Details
1. **Services Impacted** - {+ List services +}
1. **Change Technician** - `@anganga`
1. **Change Reviewer** - @ahmadsherif
1. **Time tracking** - 30 minutes
1. **Downtime Component** - none
## Detailed steps for the change
### Change Steps - steps to take to execute the change
*Estimated Time to Complete (mins)* - {+Estimated Time to Complete in Minutes+}
- [ ] Set label ~"change::in-progress" `/label ~change::in-progress`
- [ ] Merge the following MRs
- [ ] https://gitlab.com/gitlab-com/gl-infra/chef-repo/-/merge_requests/3196
- [ ] https://ops.gitlab.net/gitlab-com/gl-infra/config-mgmt/-/merge_requests/5469
- [ ] Set label ~"change::complete" `/label ~change::complete`
## Rollback
### Rollback steps - steps to be taken in the event of a need to rollback this change
*Estimated Time to Complete (mins)* - 10 minutes
- [ ] Revert https://ops.gitlab.net/gitlab-com/gl-infra/config-mgmt/-/merge_requests/5469
- [ ] Set label ~"change::aborted" `/label ~change::aborted`
## Monitoring
### Key metrics to observe
<!--
* Describe which dashboards and which specific metrics we should be monitoring related to this change using the format below.
-->
- Metric: patroni Service Error Ratio and pgbouncer SLI Error Ratio
- Location: https://dashboards.gitlab.net/goto/RKW3kRPVz?orgId=1
- What changes to this metric should prompt a rollback: elevated error rates for pgbouncer
## Change Reviewer checklist
<!--
To be filled out by the reviewer.
-->
~C4 ~C3 ~C2 ~C1:
- [ ] Check if the following applies:
- The **scheduled day and time** of execution of the change is appropriate.
- The [change plan](#detailed-steps-for-the-change) is technically accurate.
- The change plan includes **estimated timing values** based on previous testing.
- The change plan includes a viable [rollback plan](#rollback).
- The specified [metrics/monitoring dashboards](#key-metrics-to-observe) provide sufficient visibility for the change.
~C2 ~C1:
- [ ] Check if the following applies:
- The complexity of the plan is appropriate for the corresponding risk of the change. (i.e. the plan contains clear details).
- The change plan includes success measures for all steps/milestones during the execution.
- The change adequately minimizes risk within the environment/service.
- The performance implications of executing the change are well-understood and documented.
- The specified metrics/monitoring dashboards provide sufficient visibility for the change.
- If not, is it possible (or necessary) to make changes to observability platforms for added visibility?
- The change has a primary and secondary SRE with knowledge of the details available during the change window.
- The labels ~"blocks deployments" and/or ~"blocks feature-flags" are applied as necessary
## Change Technician checklist
<!--
To find out who is on-call, in #production channel run: /chatops run oncall production.
-->
- [ ] Check if all items below are complete:
- The [change plan](#detailed-steps-for-the-change) is technically accurate.
- This Change Issue is linked to the appropriate Issue and/or Epic
- Change has been tested in staging and results noted in a comment on this issue.
- A dry-run has been conducted and results noted in a comment on this issue.
- The change execution window respects the [Production Change Lock periods](https://about.gitlab.com/handbook/engineering/infrastructure/change-management/#production-change-lock-pcl).
- For ~C1 and ~C2 change issues, the change event is added to the [GitLab Production](https://calendar.google.com/calendar/embed?src=gitlab.com_si2ach70eb1j65cnu040m3alq0%40group.calendar.google.com) calendar.
- For ~C1 and ~C2 change issues, the SRE on-call has been informed prior to change being rolled out. (In #production channel, mention `@sre-oncall` and this issue and await their acknowledgement.)
- For ~C1 and ~C2 change issues, the SRE on-call provided approval with the ~eoc_approved label on the issue.
- For ~C1 and ~C2 change issues, the Infrastructure Manager provided approval with the ~manager_approved label on the issue.
- Release managers have been informed (If needed! Cases include DB change) prior to change being rolled out. (In #production channel, mention `@release-managers` and this issue and await their acknowledgment.)
- There are currently no [active incidents](https://gitlab.com/gitlab-com/gl-infra/production/-/issues?scope=all&utf8=%E2%9C%93&state=opened&label_name[]=Incident%3A%3AActive) that are ~severity::1 or ~severity::2
- If the change involves doing maintenance on a database host, an appropriate silence targeting the host(s) should be added for the duration of the change.
issue