Investigation - Usage Ping returns -1 for some of Redis HLL counters
Opening an issue to add information about Usage Ping bug
Problem description
Incorrect weekly keys for Redis HLL counters
Technical inside
def weekly_redis_keys(events:, start_date:, end_date:, context: '')
weeks = end_date.to_date.cweek - start_date.to_date.cweek
weeks = 1 if weeks == 0
(0..(weeks - 1)).map do |week_increment|
events.map { |event| redis_key(event, start_date + week_increment * 7.days, context) }
end.flatten
end
- This code leads to
weekly_redis_keysmethod returning empty array[]
When we get the data we use the hardening method for Redis
# https://gitlab.com/gitlab-org/gitlab/-/blob/master/lib/gitlab/usage_data_counters/hll_redis_counter.rb#L237
redis_usage_data{ Gitlab::Redis::HLL.count(keys: []) }
# Hardening method catch 2 type of errors https://gitlab.com/gitlab-org/gitlab/-/blob/master/lib/gitlab/utils/usage_data.rb#L155
def redis_usage_counter
yield
rescue ::Redis::CommandError, Gitlab::UsageDataCounters::BaseCounter::UnknownEvent
FALLBACK
end
Checking the logic behaviour in testing, development and production env, we noticed that we have different behaviours.
This logic is behaving inconcistent when trowing errors. For example
Gitlab::Redis::HLL.count(keys: [])
# In development we would see
Gitlab::Instrumentation::RedisClusterValidator::CrossSlotError: Redis command PFCOUNT arguments hash to different slots. See https://docs.gitlab.com/ee/development/redis.html#multi-key-commands
# While in production we would see
Redis::CommandError (ERR wrong number of arguments for 'pfcount' command)
Summary
My first assumption was that Usage Ping will fail for any environment.
Usage ping is failing in tests and development as we do not treat Gitlab::Instrumentation::RedisClusterValidator::CrossSlotError exception, and will not fail in other environments as we catch Redis::CommandError and we return -1.
Questions
- Why de we have different behaviours for development and testing?
- Could we improve anything in this area to helps us have same behaviour?
Monitor weekly Usage ping generation
I have a GitLab installation version 13.6.0 where I plan to run tests for usage ping.
Data affected
The weeks affected would be weeks 1,2,3,4 from the beginning of the year.
Redis HLL monthly counters will return -1
(Looking to get a full list of metric names)