Update int.gprd.gitlab.net SSL certificate in production
<!--
Please review https://about.gitlab.com/handbook/engineering/infrastructure/change-management/ for the most recent information on our change plans and execution policies.
-->
# Production Change
### Change Summary
The internal interface for haproxy nodes has an SSL certificate that will expire on May 14th.
### Change Details
1. **Services Impacted** - GPRD Haproxy
1. **Change Technician** - @cmcfarland
1. **Change Criticality** - ~C3
1. **Change Type** - ~"change::unscheduled"
1. **Change Reviewer** - @nhoppe1
1. **Due Date** - {+ Date and time (in UTC) for the execution of the change +}
1. **Time tracking** - {+ 55 +}
1. **Downtime Component** - {+ N/A +}
## Detailed steps for the change
### Pre-Change Steps - steps to be completed before execution of the change
*Estimated Time to Complete (mins)* - {+20+}
- [x] Create a backup of the existing key and certificate: `./bin/gkms-vault-show frontend-loadbalancer gprd | grep internal > internal.backup`
- [x] Create JSON-ified versions of the new chained certificate: `awk 'NF {sub(/\r/, ""); printf "%s\\n",$0;}' \*.gprd.gitlab.net.chained.crt > \*.gprd.gitlab.net.json.chained.crt`
- [x] Verify the certificate expiration time and date: `knife ssh roles:gprd-base-lb-fe "echo -n | openssl s_client -showcerts -servername int.gprd.gitlab.net -connect localhost:443 2>/dev/null | openssl x509 -inform pem -noout -text | grep 'Not After'"`
### Change Steps - steps to take to execute the change
*Estimated Time to Complete (mins)* - {+35+}
- [x] Set label ~change::in-progress on this issue
- [x] Replace the certificate in the `frontend-loadbalancer gprd` gkms vault. The key for the cert is `internal_crt`.
- [x] Verify the private key matches in the `frontend-loadbalancer gprd` vault. They key for the private key is `internal_key`.
- [x] Run chef locally on a single front end node: `ssh fe-01-lb-gprd.c.gitlab-production.internal "sudo chef-client"`
### Post-Change Steps - steps to take to verify the change
*Estimated Time to Complete (mins)* - {+5+}
- [x] Verify the certificate expiration time and date: `knife ssh roles:gprd-base-lb-fe "echo -n | openssl s_client -showcerts -servername int.gprd.gitlab.net -connect localhost:443 2>/dev/null | openssl x509 -inform pem -noout -text | grep 'Not After'"`
## Rollback
### Rollback steps - steps to be taken in the event of a need to rollback this change
*Estimated Time to Complete (mins)* - {+15+}
- [ ] Edit the `frontend-loadbalancer gprd` gkms vault and replace the values with the old certificate and key.
- [ ] Force a chef run on the front end nodes.
## Monitoring
### Key metrics to observe
<!--
* Describe which dashboards and which specific metrics we should be monitoring related to this change using the format below.
-->
- Metric: SSL Cert Expiration
- Location: https://thanos-query.ops.gitlab.net/graph?g0.range_input=15m&g0.max_source_resolution=0s&g0.expr=probe_ssl_earliest_cert_expiry%7Benvironment%3D%22gprd%22%7D%20-%20time()%20%3C%2014%20*%2086400&g0.tab=0
- What changes to this metric should prompt a rollback: {+Describe Changes+}
## Summary of infrastructure changes
- [ ] Does this change introduce new compute instances?
- [ ] Does this change re-size any existing compute instances?
- [ ] Does this change introduce any additional usage of tooling like Elastic Search, CDNs, Cloudflare, etc?
<!--
* If you answer yes to any of the items in this checklist, summarize below.
-->
{+Summary of the above+}
## Changes checklist
<!--
To find out who is on-call, in #production channel run: /chatops run oncall production.
-->
- [x] This issue has a criticality label (e.g. ~C1, ~C2, ~C3, ~C4) and a change-type label (e.g. ~"change::unscheduled", ~"change::scheduled") based on the [Change Management Criticalities](https://about.gitlab.com/handbook/engineering/infrastructure/change-management/#change-criticalities).
- [x] This issue has the change technician as the assignee.
- [x] Pre-Change, Change, Post-Change, and Rollback steps and have been filled out and reviewed.
- [x] Necessary approvals have been completed based on the [Change Management Workflow](https://about.gitlab.com/handbook/engineering/infrastructure/change-management/#change-request-workflows).
- [x] Change has been tested in staging and results noted in a comment on this issue.
- [ ] A dry-run has been conducted and results noted in a comment on this issue.
- [x] SRE on-call has been informed prior to change being rolled out. (In #production channel, mention `@sre-oncall` and this issue and await their acknowledgement.)
- [ ] There are currently no [active incidents](https://gitlab.com/gitlab-com/gl-infra/production/-/issues?scope=all&utf8=%E2%9C%93&state=opened&label_name[]=Incident%3A%3AActive).
issue