Update int.gprd.gitlab.net SSL certificate in production

Production Change

Change Summary

The internal interface for haproxy nodes has an SSL certificate that will expire on May 14th.

Change Details

  1. Services Impacted - GPRD Haproxy
  2. Change Technician - @cmcfarland
  3. Change Criticality - C3
  4. Change Type - changeunscheduled
  5. Change Reviewer - @nhoppe1
  6. Due Date - Date and time (in UTC) for the execution of the change
  7. Time tracking - 55
  8. Downtime Component - N/A

Detailed steps for the change

Pre-Change Steps - steps to be completed before execution of the change

Estimated Time to Complete (mins) - 20

  • Create a backup of the existing key and certificate: ./bin/gkms-vault-show frontend-loadbalancer gprd | grep internal > internal.backup
  • Create JSON-ified versions of the new chained certificate: awk 'NF {sub(/\r/, ""); printf "%s\\n",$0;}' \*.gprd.gitlab.net.chained.crt > \*.gprd.gitlab.net.json.chained.crt
  • Verify the certificate expiration time and date: knife ssh roles:gprd-base-lb-fe "echo -n | openssl s_client -showcerts -servername int.gprd.gitlab.net -connect localhost:443 2>/dev/null | openssl x509 -inform pem -noout -text | grep 'Not After'"

Change Steps - steps to take to execute the change

Estimated Time to Complete (mins) - 35

  • Set label changein-progress on this issue
  • Replace the certificate in the frontend-loadbalancer gprd gkms vault. The key for the cert is internal_crt.
  • Verify the private key matches in the frontend-loadbalancer gprd vault. They key for the private key is internal_key.
  • Run chef locally on a single front end node: ssh fe-01-lb-gprd.c.gitlab-production.internal "sudo chef-client"

Post-Change Steps - steps to take to verify the change

Estimated Time to Complete (mins) - 5

  • Verify the certificate expiration time and date: knife ssh roles:gprd-base-lb-fe "echo -n | openssl s_client -showcerts -servername int.gprd.gitlab.net -connect localhost:443 2>/dev/null | openssl x509 -inform pem -noout -text | grep 'Not After'"

Rollback

Rollback steps - steps to be taken in the event of a need to rollback this change

Estimated Time to Complete (mins) - 15

  • Edit the frontend-loadbalancer gprd gkms vault and replace the values with the old certificate and key.
  • Force a chef run on the front end nodes.

Monitoring

Key metrics to observe

Summary of infrastructure changes

  • Does this change introduce new compute instances?
  • Does this change re-size any existing compute instances?
  • Does this change introduce any additional usage of tooling like Elastic Search, CDNs, Cloudflare, etc?

Summary of the above

Changes checklist

  • This issue has a criticality label (e.g. C1, C2, C3, C4) and a change-type label (e.g. changeunscheduled, changescheduled) based on the Change Management Criticalities.
  • This issue has the change technician as the assignee.
  • Pre-Change, Change, Post-Change, and Rollback steps and have been filled out and reviewed.
  • Necessary approvals have been completed based on the Change Management Workflow.
  • Change has been tested in staging and results noted in a comment on this issue.
  • A dry-run has been conducted and results noted in a comment on this issue.
  • SRE on-call has been informed prior to change being rolled out. (In #production channel, mention @sre-oncall and this issue and await their acknowledgement.)
  • There are currently no active incidents.
Edited by Cameron McFarland