Perform the migration to redis-ratelimiting
Summary
We need to migrate two distinct call sites:
- Rack::Attack (https://gitlab.com/gitlab-org/gitlab/-/blob/dada5e77f541be32bd6754452ff2c360712b3f68/lib/gitlab/rack_attack/instrumented_cache_store.rb#L17)
- This is controlled with the
USE_RATE_LIMITING_STORE_FOR_RACK_ATTACK
env var. When set to 1, it enables usage of this storage for Rack::Attack.
- This is controlled with the
- Gitlab::ApplicationRateLimiter (https://gitlab.com/gitlab-org/gitlab/-/blob/dada5e77f541be32bd6754452ff2c360712b3f68/lib/gitlab/application_rate_limiter.rb#L76, maybe others).
- This is controlled with the
use_rate_limiting_store_for_application_rate_limiter
feature flag.
- This is controlled with the
Rack::Attack processes more traffic. Both have short TTLs (on the order of minutes).
We are not migrating existing data. As discussed in #1247 (closed):
- Rack::Attack's TTL is a minute.
- The maximum TTL for Gitlab::ApplicationRateLimiter is 3 minutes.
- The failure case is we accept slightly more requests than we should have during the transition. We can mitigate this by rolling out at a quiet time.
- Adding migration code will run a risk of introducing bugs in the migration code itself.
Tasks
-
Staging (ApplicationRateLimiter) -
Configure the new instance so Rails can see it. At this point it should be entirely unused still. -
Enable the use_rate_limiting_store_for_application_rate_limiter
feature flag. Percentage of time is not sensible here because that just leaves a longer window where requests could go to either Redis. -
Wait a while and observe if that looks good.
-
-
Production (ApplicationRateLimiter) -
Configure the new instance so Rails can see it. At this point it should be entirely unused still. -
Enable the use_rate_limiting_store_for_application_rate_limiter
feature flag. Percentage of time is not sensible here because that just leaves a longer window where requests could go to either Redis. -
Wait a while and observe if that looks good.
-
-
Staging (Rack::Attack) -
Set USE_RATE_LIMITING_STORE_FOR_RACK_ATTACK=1
everywhere. -
Wait a while and observe if that looks good.
-
-
Production (Rack::Attack) -
Set USE_RATE_LIMITING_STORE_FOR_RACK_ATTACK=1
everywhere. -
Wait a while and observe if that looks good.
-
-
Clean up -
Remove feature flag and env var from application code. -
Wait for that to deploy. -
Delete the feature flag from all environments. -
Remove the env var from all environments. -
Update https://docs.gitlab.com/ee/development/redis/new_redis_instance.html with anything we learned.
-
Edited by Sean McGivern