Skip to content

GitLab Next

Why GitLab
Pricing
Contact Sales
Explore

Sign in
Get free trial

[gprd] Enable the usage new redis-sessions instance

⚠ For SRE DRI: please critically review the steps.
Although they should be pretty similar to #5663 (closed), there are two differences:

Do we need something similar to gitlab-com/gl-infra/k8s-workloads/gitlab-com!1283 (merged)? It seems that no, as we refactored the configs since then, and we needed such "touch" just because previously we updated both staging and prod in the same MR during the staging change issue, so it only needed to trigger the release. Now we'll have the k8 MR which will do just that. Please correct me if I am wrong.
We don't expect any user-initiated requests to the new Redis instance, it will be controlled via the feature flag. We just need to make sure the existing instances are fine and the new instance is healthy.

Note: the actual switch to using the instances is behind the FF.

Staging change issue: #5962 (closed)

FF rollout issue: scalability#1429 (closed)

Production Change

Change Summary

This change will configure the Redis-Sessions instance.
scalability#1309 (closed)

Change Details

Services Impacted - ServiceRedis
Change Technician - @igorwwwwwwwwwwwwwwwwwwww
Change Reviewer - @nmilojevic1 / @alipniagov
Time tracking - 35min
Downtime Component - None

Detailed steps for the change

Pre-Change Steps - steps to be completed before execution of the change

Estimated Time to Complete (mins) - 1 min

Ensure that gitlab-org/gitlab!74202 (merged) has been merged and deployed to production
Make sure that we tested the FF rollout on staging: scalability#1429 (closed), and no issues where found
Obtain review/approval on:
1. https://gitlab.com/gitlab-com/gl-infra/chef-repo/-/merge_requests/967
2. gitlab-com/gl-infra/k8s-workloads/gitlab-com!1370 (merged)
Set label changein-progress on this issue

Change Steps - steps to take to execute the change

Estimated Time to Complete (mins) - 30 minutes

Merge chef MR: https://gitlab.com/gitlab-com/gl-infra/chef-repo/-/merge_requests/967
From a local copy of chef-repo, run ./bin/gkms-vault-edit gitlab-omnibus-secrets gprd and add an entry for redis_sessions_instance alongside the existing redis configs; use the same password, just adjust the identifier at the end.
Merge gitlab-com/gl-infra/k8s-workloads/gitlab-com!1370 (merged)
- NB: depends on the chef MR being merged first to have the expected effect; do not re-arrange the order.
Monitor/wait for the k8s pipeline to complete the production deploy

Post-Change Steps - steps to take to verify the change

Estimated Time to Complete (mins) - 2 minutes

We plan to start using the instance via enabling the FF, so this is not part of this change issue (we'll use FF rollout issue). Here, to validate, we should concentrate on two things:

Make sure the main Redis instance is healthy. We don't expect any user-invoked queries into the redis-sessions until the FF is off.
Make sure that the redis-sessions is alive and we could connect to that.

Rollback

Rollback steps - steps to be taken in the event of a need to rollback this change

Estimated Time to Complete (mins) - 30 min

Disable the feature flag use_multi_store, if we enabled that previously (/chatops run feature set use_multi_store false --production)
Set ENV var GITLAB_USE_REDIS_SESSIONS_STORE to false and restart (true by default)
If necessary (the presence of the configuration is the problem), revert the k8s and chef MRs and apply.

Monitoring

Key metrics to observe

Metric: Redis main instance health
- Location: https://dashboards.gitlab.net/d/redis-main/redis-overview?orgId=1
- What changes to this metric should prompt a rollback: the switch to using the new instance is behind the FF, so we don't expect any activity there unless we flip it. We should concentrate on the main instance (shared state) health there.
Metric: Redis-sessions overview
- Location: https://dashboards.gitlab.net/d/redis-sessions-main/redis-sessions-overview?orgId=1
- We don't expect any user traffic there during the rollout

Summary of infrastructure changes

Does this change introduce new compute instances?
Does this change re-size any existing compute instances?
Does this change introduce any additional usage of tooling like Elastic Search, CDNs, Cloudflare, etc?

None

Changes checklist

This issue has a criticality label (e.g. C1, C2, C3, C4) and a change-type label (e.g. changeunscheduled, changescheduled) based on the Change Management Criticalities.
This issue has the change technician as the assignee.
Pre-Change, Change, Post-Change, and Rollback steps and have been filled out and reviewed.
This Change Issue is linked to the appropriate Issue and/or Epic
Necessary approvals have been completed based on the Change Management Workflow.
Change has been tested in staging and results noted in a comment on this issue.
A dry-run has been conducted and results noted in a comment on this issue.
SRE on-call has been informed prior to change being rolled out. (In #production channel, mention @sre-oncall and this issue and await their acknowledgement.)
Release managers have been informed (If needed! Cases include DB change) prior to change being rolled out. (In #production channel, mention @release-managers and this issue and await their acknowledgment.)
There are currently no active incidents.

Edited Dec 03, 2021 by Igor

Assignee

Time tracking