Update int.gprd.gitlab.net SSL certificate in production
<!-- Please review https://about.gitlab.com/handbook/engineering/infrastructure/change-management/ for the most recent information on our change plans and execution policies. --> # Production Change ### Change Summary The internal interface for haproxy nodes has an SSL certificate that will expire on May 14th. ### Change Details 1. **Services Impacted** - GPRD Haproxy 1. **Change Technician** - @cmcfarland 1. **Change Criticality** - ~C3 1. **Change Type** - ~"change::unscheduled" 1. **Change Reviewer** - @nhoppe1 1. **Due Date** - {+ Date and time (in UTC) for the execution of the change +} 1. **Time tracking** - {+ 55 +} 1. **Downtime Component** - {+ N/A +} ## Detailed steps for the change ### Pre-Change Steps - steps to be completed before execution of the change *Estimated Time to Complete (mins)* - {+20+} - [x] Create a backup of the existing key and certificate: `./bin/gkms-vault-show frontend-loadbalancer gprd | grep internal > internal.backup` - [x] Create JSON-ified versions of the new chained certificate: `awk 'NF {sub(/\r/, ""); printf "%s\\n",$0;}' \*.gprd.gitlab.net.chained.crt > \*.gprd.gitlab.net.json.chained.crt` - [x] Verify the certificate expiration time and date: `knife ssh roles:gprd-base-lb-fe "echo -n | openssl s_client -showcerts -servername int.gprd.gitlab.net -connect localhost:443 2>/dev/null | openssl x509 -inform pem -noout -text | grep 'Not After'"` ### Change Steps - steps to take to execute the change *Estimated Time to Complete (mins)* - {+35+} - [x] Set label ~change::in-progress on this issue - [x] Replace the certificate in the `frontend-loadbalancer gprd` gkms vault. The key for the cert is `internal_crt`. - [x] Verify the private key matches in the `frontend-loadbalancer gprd` vault. They key for the private key is `internal_key`. - [x] Run chef locally on a single front end node: `ssh fe-01-lb-gprd.c.gitlab-production.internal "sudo chef-client"` ### Post-Change Steps - steps to take to verify the change *Estimated Time to Complete (mins)* - {+5+} - [x] Verify the certificate expiration time and date: `knife ssh roles:gprd-base-lb-fe "echo -n | openssl s_client -showcerts -servername int.gprd.gitlab.net -connect localhost:443 2>/dev/null | openssl x509 -inform pem -noout -text | grep 'Not After'"` ## Rollback ### Rollback steps - steps to be taken in the event of a need to rollback this change *Estimated Time to Complete (mins)* - {+15+} - [ ] Edit the `frontend-loadbalancer gprd` gkms vault and replace the values with the old certificate and key. - [ ] Force a chef run on the front end nodes. ## Monitoring ### Key metrics to observe <!-- * Describe which dashboards and which specific metrics we should be monitoring related to this change using the format below. --> - Metric: SSL Cert Expiration - Location: https://thanos-query.ops.gitlab.net/graph?g0.range_input=15m&g0.max_source_resolution=0s&g0.expr=probe_ssl_earliest_cert_expiry%7Benvironment%3D%22gprd%22%7D%20-%20time()%20%3C%2014%20*%2086400&g0.tab=0 - What changes to this metric should prompt a rollback: {+Describe Changes+} ## Summary of infrastructure changes - [ ] Does this change introduce new compute instances? - [ ] Does this change re-size any existing compute instances? - [ ] Does this change introduce any additional usage of tooling like Elastic Search, CDNs, Cloudflare, etc? <!-- * If you answer yes to any of the items in this checklist, summarize below. --> {+Summary of the above+} ## Changes checklist <!-- To find out who is on-call, in #production channel run: /chatops run oncall production. --> - [x] This issue has a criticality label (e.g. ~C1, ~C2, ~C3, ~C4) and a change-type label (e.g. ~"change::unscheduled", ~"change::scheduled") based on the [Change Management Criticalities](https://about.gitlab.com/handbook/engineering/infrastructure/change-management/#change-criticalities). - [x] This issue has the change technician as the assignee. - [x] Pre-Change, Change, Post-Change, and Rollback steps and have been filled out and reviewed. - [x] Necessary approvals have been completed based on the [Change Management Workflow](https://about.gitlab.com/handbook/engineering/infrastructure/change-management/#change-request-workflows). - [x] Change has been tested in staging and results noted in a comment on this issue. - [ ] A dry-run has been conducted and results noted in a comment on this issue. - [x] SRE on-call has been informed prior to change being rolled out. (In #production channel, mention `@sre-oncall` and this issue and await their acknowledgement.) - [ ] There are currently no [active incidents](https://gitlab.com/gitlab-com/gl-infra/production/-/issues?scope=all&utf8=%E2%9C%93&state=opened&label_name[]=Incident%3A%3AActive).
issue