Harden Redis HLL metric
We have now available HLL from Redis !35580 (merged)
HLL class https://gitlab.com/gitlab-org/gitlab/-/blob/master/lib/gitlab/redis/hll.rb
Redis HLL PFCOUNT docs https://redis.io/commands/pfcount
Gitlab::Redis::HLL.count(keys: keys)
Redis HLL PFADD docs https://redis.io/commands/pfadd
Gitlab::Redis::HLL.add(key: target_key, value: author_id, expiry: KEY_EXPIRY_LENGTH)
Redis HLL PFMERGE not implemented
pfmerge
Merge multiple HyperLogLog values into an unique value that will approximate the cardinality of the union of the observed Sets of the source HyperLogLog structures.
PFMERGE vs PFCOUNT with multiple keys
-
PFMERGE(destination_key, *source_keys)
- merges multiple keys into a unique value, and stores the approximate the cardinality of the union of the observed Sets of the source HyperLogLog structures into the given destination_key -
PFCOUNT(*keys)
- When called with multiple keys, returns the approximated cardinality of the union of the HyperLogLogs passed, by internally merging the HyperLogLogs stored at the provided keys into a temporary HyperLogLog.
In the end is a matter of storing the result temporary or in a given key. Considering this I think we do not need a the moment PFMERGE method
Opening this issue to discuss next step here
Limitations:
Data is stored in Redis for a limited amount of time
Current implementations
Next steps
-
Add( at the moment we can use the current redis_usage_data counters)hll_redis_counter
hardening method in order to catch any possible redis exceptions and make the implementation of metrics in usage ping easier for developers -
Add a check for keys format, thinking to add a check and raise an error if keys is not having the format 2020-215-{project_action}
{project_action}
is ensuring it stays in the same hash slot !38782 (merged) -
Add docs for how to add new Redis HLL counters #235457 (closed) -
Add API call to allow UI events tracking and storing them to Redis #235459 (closed) -
Add Usage data tracking using Redis HLL for UI events for non-authenticated users #246823 (closed) -
Add HLL merge method if we consider storing the value of the merged keys. This is optional as the same result is obtained when using count with multiple keys. -
Consider adding tests for exiting Redis HLL methods #235476 (closed) -
Add the Base HLL Redis class to encapsulate a minimal interface for developers to use when adding new trackers and counters #235697 (closed) -
Implement feature per event https://docs.gitlab.com/ee/development/feature_flags/development.html#feature-actors #238154 (closed) With using feature per user, we have to keep in mind we can track only users that are signed in. https://docs.gitlab.com/ee/operations/feature_flags.html#percent-of-users #238717 (closed)
-
Implement helper method for controllers #235954 (closed) -
Add docs for the helper method in controller #241122 (closed) -
Analyse impact of added Redis keys -
in gitlab.com https://gitlab.com/gitlab-com/gl-infra/infrastructure/-/issues/11112 and https://gitlab.com/gitlab-com/gl-infra/infrastructure/-/issues/11112 -
Wold make sense to have this measured in self-managed and get the information back via usage ping? if this is possible to get
-
-
Expire metrics on 42 days #239449 (closed) -
Fix expire time for non default values #240901 (closed) -
Hash values we count on https://gitlab.slack.com/archives/CL3A7GFPF/p1598458011048200Not needed Redis HLL does this all-ready -
Add automatically all know_events to usage_data key redis_hll_counters
generation #241686 (closed) -
Add redis_hll_counters to usage_data table to version app https://gitlab.com/gitlab-services/version-gitlab-com/-/merge_requests/551 -
Remove individual events per category from usage data generation #244357 (closed) -
Remove individual events per category column from usage data table from version app https://gitlab.com/gitlab-services/version-gitlab-com/-/issues/384 -
Add javascript example on how we could track UI events using UsageData#increment_unique_users API #244355 (closed) -
Allow tracking if usage ping is enabled #245247 (closed) -
Add monthly time frame to redis_hll_counters
, with this every event has 2 metrics in usage data, one for weekly one for monthly #247098 (closed) -
Add CSRF token check to USageData API to cover security concerns #247454 (closed) -
Add naming pattern for Redis HLL metrics to Telemetry guide #247472 (closed) -
Change track_usage_event
method to accept any value #247862 (closed) -
Remove individual check for usage ping enabled #246494 (closed) cc @jeromezng @a_akgun @ali-gitlab
Please add any concerns and ideas, thank you!