Rotate JWT signing keys in CDot non-production environments
Production Change
Change Summary
Yearly rotation of the following JWT signing keys used in CustomersDot non-production environments (dev, test, stg, stg-ref):
cdot_gitlab_internal_jwt_signing_keycdot_cloud_connector_jwt_signing_key
We'll open a separate change management issue for rotating these keys in production after the non-production environments are completed successfully.
Change Details
- Services Impacted - ServiceCustomersDot ServiceAIGateway
- Change Technician - @tyleramos
- Change Reviewer - DRI for the review of this change
- Scheduled Date and Time (UTC in format YYYY-MM-DD HH:MM) - Start date and time planned to execute change steps YYYY-MM-DD HH:MM
- Time tracking - This change should only involve code changes in CDot. No manual work should be required.
- Downtime Component - none
Set Maintenance Mode in GitLab
If your change involves scheduled maintenance, add a step to set and unset maintenance mode per our runbooks. This will make sure SLA calculations adjust for the maintenance period.
Detailed steps for the change
Change steps - steps to take to execute the change
Estimated Time to Complete (mins) - Estimated Time to Complete in Minutes
-
Set label changein-progress /label ~change::in-progress -
Generate new RSA private keys by running bin/generate_rsa_key_pairfor each non-production env. -
Open MR to update the *_jwt_validation_keyvalue incredentials.yml.encfor each enironment with the keys generated from previous step. This ensures that validators get access to the new key before we use it to sign tokens. -
Merge MR to deploy this change: the JWKS endpoint should now serve the new key - Check if discovery jwks endpoint returns both keys
- Monitor service logs of the validating service to ensure there is no increase in 401s.
- Refer to the respective service runbook for how to do this.
- For
CDotinternal API call toGitLab, check the Kibana logs for any anomaly.
-
Invalidate caches in validators by waiting at least 24 hours. All services that validate tokens should then have refreshed their key sets and can now validate tokens signed with both the old and new key. -
Open MR to swap *_jwt_signing_keyand*_jwt_validation_key. This implies all new tokens will be signed with the key generated in step 1 and tokens signed with the old*_jwt_signing_keycan still be validated by the current*_jwt_validation_key -
Merge MR to deploy this change. - Monitor service logs of the validating service to ensure there is no increase in 401s.
- Refer to the respective service runbook for how to do this.
- For
CDotinternal API call toGitLab, check the Kibana logs for any anomaly.
- Monitor service logs of the validating service to ensure there is no increase in 401s.
-
Open MR to remove the value of *_jwt_validation_keyincredentials.yml.enc, after 3 days. -
Merge MR to deploy this change. - Verify that the JWKS endpoint does not include this key anymore.
-
Set label changecomplete /label ~change::complete
Rollback
Rollback steps - steps to be taken in the event of a need to rollback this change
Estimated Time to Complete (mins) - Estimated Time to Complete in Minutes
-
Rollback Step 1 -
Rollback Step 2 -
Set label changeaborted /label ~change::aborted
Monitoring
Key metrics to observe
- Check if discovery jwks endpoint returns both keys
- Monitor service logs of the validating service to ensure there is no increase in 401s.
- Refer to the respective service runbook for how to do this.
- For
CDotinternal API call toGitLab, check the Kibana logs for any anomaly.
Change Reviewer checklist
-
Check if the following applies: - The scheduled day and time of execution of the change is appropriate.
- The change plan is technically accurate.
- The change plan includes estimated timing values based on previous testing.
- The change plan includes a viable rollback plan.
- The specified metrics/monitoring dashboards provide sufficient visibility for the change.
-
Check if the following applies: - The complexity of the plan is appropriate for the corresponding risk of the change. (i.e. the plan contains clear details).
- The change plan includes success measures for all steps/milestones during the execution.
- The change adequately minimizes risk within the environment/service.
- The performance implications of executing the change are well-understood and documented.
- The specified metrics/monitoring dashboards provide sufficient visibility for the change.
- If not, is it possible (or necessary) to make changes to observability platforms for added visibility?
- The change has a primary and secondary SRE with knowledge of the details available during the change window.
- The change window has been agreed with Release Managers in advance of the change. If the change is planned for APAC hours, this issue has an agreed pre-change approval.
- The labels blocks deployments and/or blocks feature-flags are applied as necessary.
Change Technician checklist
-
The change plan is technically accurate. -
This Change Issue is linked to the appropriate Issue and/or Epic -
Change has been tested in staging and results noted in a comment on this issue. -
A dry-run has been conducted and results noted in a comment on this issue. -
The change execution window respects the Production Change Lock periods. -
For C1 and C2 change issues, the change event is added to the GitLab Production calendar. -
For C1 and C2 change issues, the Infrastructure Manager provided approval with the manager_approved label on the issue. Mention @gitlab-org/saas-platforms/inframanagersin this issue to request approval and provide visibility to all infrastructure managers. -
For C1, C2, or blocks deployments change issues, confirm with Release managers that the change does not overlap or hinder any release process (In #productionchannel, mention@release-managersand this issue and await their acknowledgment.)