Measurement of namespace storage statistics
This is a follow up of https://gitlab.com/gitlab-org/gitlab-ce/issues/62214
After https://gitlab.com/gitlab-org/gitlab-ce/merge_requests/28996 gets deployed to GitLab.com, we need to measure the performance in different environments: staging and production.
Prework
There's some prework that needs to be done and deployed before starting the measuring:
-
Fix Namespace::AggregationSchedule should keep the lease until timeout - https://gitlab.com/gitlab-org/gitlab-ce/merge_requests/30305 -
Include edge case to refresh root statistics on other scenarios - https://gitlab.com/gitlab-org/gitlab-ce/merge_requests/30329
On staging
-
Once is one staging, enable the Feature Flag globally and observe.
/chatops run feature set update_statistics_namespace true --staging
-
If there aren't any degradation problems, tune down the time to 1.5h
(half of the default):- We'll need access to the rails console on staging, or someone that runs the commands for us:
# To set the timeout to 1.5
Gitlab::Redis::SharedState.with do |redis|
redis.set(Namespace::AggregationSchedule::REDIS_SHARED_KEY, 1.5.hours)
end
# To check the delay timeout
=> Namespace::AggregationSchedule.delay_timeout
=> 5400
On production
If there aren't any major problems on staging:
-
We enable the Feature Flag for gitlab-org
for1h
and observe
/chatops run feature set --group=gitlab-org update_statistics_namespace true
-
If there aren't any degradation problems, we enable the Feature Flag for gitlab-org
for1 day
and observe. -
If there aren't any degradation problems, we enable the Feature flag for gitlab-org
for3 days
and we observe -
If there aren't any degradation problems, we enable the feature flag globally for 1 day and observe. -
If there aren't any degradation problems, tune down the time to 1.5h
(half of the default) and observe.
# To set the timeout to 1.5
Gitlab::Redis::SharedState.with do |redis|
redis.set(Namespace::AggregationSchedule::REDIS_SHARED_KEY, 1.5.hours)
end
# To check the delay timeout
=> Namespace::AggregationSchedule.delay_timeout
=> 5400
-
If 1.5h
works, hardcode this value on the codebase probably. - https://gitlab.com/gitlab-org/gitlab-ce/merge_requests/31341- If not, hardcode the default value (
3h
) - This has to be done on
Namespace::AggregationSchedule.delay_timeout
.
- If not, hardcode the default value (
-
If there aren't any major problems on production, remove the feature flag. - Hopefully we can include this change under %12.2.
- According to the documentation, we have to remove FF at least two working days before.
Other commands
To disable the feature flag at any moment:
# On production for gitlab-org
/chatops run feature set --group=gitlab-org update_statistics_namespace false
# Globally on production
/chatops run feature set update_statistics_namespace false
# Globally on staging
/chatops run feature set update_statistics_namespace false
To restore 3h as the default time
Gitlab::Redis::SharedState.with do |redis|
redis.del(Namespace::AggregationSchedule::REDIS_SHARED_KEY)
end
Conclusions
- Performance on staging and production has been shown stable
- The introduction of these two models allows us to gather namespace statistics easily
-
1.5
as a lease window is enough for now. We can reduce the window in an upcoming iteration