Feature Flag reactive_caching_limit_environment Enable
What
Remove the :reactive_cache_limit
feature flag.
The former FF got replaced by reactive_caching_limit_environment
. Clarifying:
When we enabled :reactive_cache_limit
live, it unfolded in not only the Deploy Boards, but also other areas of the code as being affected by the cache limits. So, we decided to take a step back and, instead of enabling the FF for all the code-base, we'll introduce it separately for each ReactiveCaching
usage. ~"group::configure" will do it firstly for Deploy Boards, then we'll document a suggestion on how other teams can take the same approach and enable it for their usage of ReactiveCaching
as well. We think this is a safer approach, since it's modular and also because each team will have better domain expertise to implement it over their section of code which uses ReactiveCaching
.
Therefore, I've opened an MR which will remove :reactive_cache_limit
, but will also make the feature disabled by default. So after we removing it, it will still not activate caching limits.
Here's the aforementioned MR: !34202 (merged)
Updated Plan 2020-03-06
Enabled and observe on GitLab.com for a week. If everything goes well, we set the flag as enabled by default and merge it so it's released on %12.9 . If there's no complains from on-premise customers on %12.9 , we can consider deleting the FF.
Updated 22-10-2020
stg | dev | gprd |
---|---|---|
enabled globally | enabled globally | enabled globally |
Owners
- Team: ~"group::configure"
- Most appropriate slack channel to reach out to:
#s_configure
- Best individual to reach out to: João Cunha (@Alexand)
Expectations
### What are we expecting to happen?
Exceptions: Environment
cached data will have 10MB limits.
We expect that this limits won't be exceeded, since we focused on setting big enough limits to not affect the current usage of GitLab.
What might happen if this goes wrong?
Services affected by ReactiveCaching
that go over their limits won't have their data cached and the BE will silently send a ReactiveCaching::ExceededReactiveCacheLimit
to Sentry. From a users perspective, whatever they're trying to load will simply not be loading.
List of commands to try to understand why the limit is being reached
# Fetch the environment
prd_env = Environment.find 00000 # ADD THE ENV ID
project = Project.find 00000 # OR FIND THE PROJECT FIRST
environments = project.environments.with_state(:available)
prd_env = environments.select {|e| e.name =="production" }.first # RENAME production FOR THE ENV YOU WANT
# Load the reactive cache synchronously for the given environment
prd_env_dep_plat = prd_env.deployment_platform
reactive_cache = prd_env_dep_plat.calculate_reactive_cache_for(prd_env)
# Check that the size of the cache is really exceeding
data_deep_size = Gitlab::Utils::DeepSize.new(reactive_cache, max_size: Environment.reactive_cache_hard_limit)
data_deep_size.valid?
data_deep_size.size
# Check what's inside of pods, deployments and ingresses to understand what needs to be handled better, paginated or whatever.
reactive_cache[:pods]
reactive_cache[:deployments]
reactive_cache[:ingresses]
What can we monitor to detect problems with this?
-
Best way is to monitor
ReactiveCaching::ExceededReactiveCacheLimit
on Sentry. Although, I believe that for on-premise we cannot have this quicker feedback. -
The amount of Sidekiq workers in the
reactive_caching
queue might increase, since the FE will keep trying to load the data that is never loaded for cases where the limit is reached.
Beta groups/projects
If applicable, any groups/projects that are happy to have this feature turned on early. Some organizations may wish to test big changes they are interested in with a small subset of users ahead of time for example.
-
gitlab-org/gitlab
project
Roll Out Steps
-
Enable on staging -
Test on staging -
Ensure that documentation has been updated -
Enable on GitLab.com for individual groups/projects listed above and verify behaviour -
Coordinate a time to enable the flag with #production
and#g_delivery
on slack. -
Announce on the issue an estimated time this will be enabled on GitLab.com -
Enable on GitLab.com by running chatops command in #production
-
Cross post chatops slack command to #support_gitlab-com
and in your team channel. Mention that we're monitoring this in Sentry -
Post in #support_self-managed when this is close to the monthly release -
Announce on the issue that the flag has been enabled -
Remove feature flag and add changelog entry -
After the flag removal is deployed, clean up the feature flag by running chatops command in #production
channel