Skip to content

Access exclusive lease via a separate Redis instance

Sylvester Chin requested to merge sc1-migrate-exclusive-lease into master

What does this MR do and why?

This MR adds a ClusterSharedState instance for ExclusiveLease. To SM users, there is no noticeable change since ClusterSharedState fallback config is SharedState.

This MR allows Gitlab SaaS to move exclusive lease workload to a Redis Cluster ahead of the other SharedState workloads. See gitlab-com/gl-infra/scalability#2452 (closed) for more background context. This is done via 2 feature flag: one flag to start the migration by acquiring 2 locks and another flag to cut over to using the new Redis Cluster.

Check out #421156 (closed) for explanation of the rollout and why we need 2 feature flags.

Addresses #419898 (closed)

Screenshots or screen recordings

Screenshots are required for UI changes, and strongly recommended for all other merge requests.

Before After

How to set up and validate locally

Referring to the phases in #421156 (closed)

Setup config/redis.yml

---
development:
  cluster_shared_state:
    cluster:
      - "redis://gdk.test:6001"
      - "redis://gdk.test:6002"
      - "redis://gdk.test:6000"

Phase 1 to Phase 2

  1. Initialise an exclusive lease instance and acquire a lock
[3] pry(main)> el = Gitlab::ExclusiveLease.new('el:1', timeout: 10.minutes)
=> #<Gitlab::ExclusiveLease:0x00000001612c6138 @redis_shared_state_key="gitlab:exclusive_lease:el:1", @timeout=10 minutes, @uuid="abddbf8d-942c-4612-827d-9e631c7f3b54">
[4] pry(main)> el.try_obtain
=> "abddbf8d-942c-4612-827d-9e631c7f3b54"
  1. Move to phase 2 and run try_obtain. Expectation: fail to acquire
[7] pry(main)> Feature.enable(:enable_exclusive_lease_double_lock_rw)
=> true
[8] pry(main)> el.try_obtain
=> false
  1. Back to phase 1 and try to obtain the lock. Expectation: fail to acquire
[9] pry(main)> Feature.disable(:enable_exclusive_lease_double_lock_rw)
=> true
[10] pry(main)> el.try_obtain
=> false
  1. Try to roll back from phase 2 with a new lock that is set while in phase 2. Expectation: fail to acquire
[25] pry(main)> el = Gitlab::ExclusiveLease.new('el:2to1', timeout: 10.minutes)
=> #<Gitlab::ExclusiveLease:0x000000013f8a6980 @redis_shared_state_key="gitlab:exclusive_lease:el:2to1", @timeout=10 minutes, @uuid="5d8e5c5c-d419-4bcf-8b91-8fb7e1dbc5a4">
[26] pry(main)> el.try_obtain
=> "5d8e5c5c-d419-4bcf-8b91-8fb7e1dbc5a4"
[27] pry(main)> Feature.disable(:enable_exclusive_lease_double_lock_rw)
=> true
[28] pry(main)> el.try_obtain
=> false

Phase 2 to Phase 3

  1. Obtain new lock in phase 2
[11] pry(main)> Feature.enable(:enable_exclusive_lease_double_lock_rw)
=> true
[12] pry(main)> el = Gitlab::ExclusiveLease.new('el:2', timeout: 10.minutes)
=> #<Gitlab::ExclusiveLease:0x0000000128c3c840 @redis_shared_state_key="gitlab:exclusive_lease:el:2", @timeout=10 minutes, @uuid="e42f5655-d1c9-4d98-a5ba-2fc3ec1e63c2">
[13] pry(main)> el.try_obtain
=> "e42f5655-d1c9-4d98-a5ba-2fc3ec1e63c2"
  1. Move to phase 3 and try to obtain. Expectation: fail to acquire
[15] pry(main)> Feature.enable(:use_cluster_shared_state_for_exclusive_lease)
=> true
[16] pry(main)> el.try_obtain
=> false
  1. Roll back to phase 1 on the same lock and try to obtain it. Expectation: fail to acquire
[17] pry(main)> Feature.disable(:use_cluster_shared_state_for_exclusive_lease)
=> true
[18] pry(main)> el.try_obtain
=> false
  1. Try to roll back to phase 2 with a new lock that is set while in phase 3. Expectation: fail to acquire
[20] pry(main)> el = Gitlab::ExclusiveLease.new('el:3to2', timeout: 10.minutes)
=> #<Gitlab::ExclusiveLease:0x000000013ee12488 @redis_shared_state_key="gitlab:exclusive_lease:el:3to2", @timeout=10 minutes, @uuid="be8945b4-da41-4913-8da4-84f7840bfef6">
[21] pry(main)> el.try_obtain
=> "be8945b4-da41-4913-8da4-84f7840bfef6"
[22] pry(main)> Feature.disable(:use_cluster_shared_state_for_exclusive_lease)
=> true
[23] pry(main)> el.try_obtain
=> false

Phase 1 to Phase 3 (why it is not safe)

  1. Obtain a lock in phase 1 and proceed to phase 3 to attempt to obtain the same lock. Expectation: fail to acquire
[29] pry(main)> el = Gitlab::ExclusiveLease.new('el:unsafe', timeout: 10.minutes)
=> #<Gitlab::ExclusiveLease:0x000000013fc040a0 @redis_shared_state_key="gitlab:exclusive_lease:el:unsafe", @timeout=10 minutes, @uuid="0f3bab3d-cda7-4e51-b527-ac8cf504e656">
[30] pry(main)> el.try_obtain
=> "0f3bab3d-cda7-4e51-b527-ac8cf504e656"
[31] pry(main)> Feature.enable(:enable_exclusive_lease_double_lock_rw)
=> true
[32] pry(main)> Feature.enable(:use_cluster_shared_state_for_exclusive_lease)
=> true
[33] pry(main)> el.try_obtain
=> false
  1. Rolling back is not safe since there are no existing safeguards in the code to check the new Redis store
[34] pry(main)> el = Gitlab::ExclusiveLease.new('el:unsafe:3to1', timeout: 10.minutes)
=> #<Gitlab::ExclusiveLease:0x00000001482ff3e8 @redis_shared_state_key="gitlab:exclusive_lease:el:unsafe:3to1", @timeout=10 minutes, @uuid="d5e8aebb-0f67-4f25-b405-f80397e84bc1">
[35] pry(main)> el.try_obtain
=> "d5e8aebb-0f67-4f25-b405-f80397e84bc1"
[36] pry(main)> Feature.disable(:enable_exclusive_lease_double_lock_rw)
=> true
[37] pry(main)> Feature.disable(:use_cluster_shared_state_for_exclusive_lease)
=> true
[38] pry(main)> el.try_obtain
=> "d5e8aebb-0f67-4f25-b405-f80397e84bc1"

Deadlock prevention in Phase 2

Clear the config/redis.yml so that ClusterSharedState uses SharedState's config

[1] pry(main)> Gitlab::Redis::ClusterSharedState.with(&:id)
=> "unix:///Users/sylvesterchin/work/gitlab-development-kit/redis/redis.socket/0"
[2] pry(main)> Gitlab::Redis::SharedState.with(&:id)
=> "unix:///Users/sylvesterchin/work/gitlab-development-kit/redis/redis.socket/0"

Go to phase 2 and obtain a lock. Expectation: no deadlocks

[5] pry(main)> el = Gitlab::ExclusiveLease.new('el:nodeadlock', timeout: 10.minutes)
=> #<Gitlab::ExclusiveLease:0x0000000154ed4ce8 @redis_shared_state_key="gitlab:exclusive_lease:el:nodeadlock", @timeout=10 minutes, @uuid="1b0e4a97-1719-4634-a6bc-00ac2f402d34">
[6] pry(main)> el.try_obtain
=> "1b0e4a97-1719-4634-a6bc-00ac2f402d34"

Numbered steps to set up and validate the change are strongly suggested.

MR acceptance checklist

This checklist encourages us to confirm any changes have been analyzed to reduce risks in quality, performance, reliability, security, and maintainability.

Edited by Sylvester Chin

Merge request reports