Implement HLL algorithm for distinct count of database metrics
This issue was created to capture all discussion and idea validations for following areas:
- https://gitlab.com/gitlab-org/telemetry/-/issues/421 requested feature that would allow to calculate unions over different metrics, because some metrics are only tracked via Redis HLL counter, in order to be able to calculate unions with database calculated metrics, we need first create HLL representation of them, and then calculate their unions with Redis only metrics. Additionally calculating unions on database level, will yield additional load over database, as intermediate results calculated for single xMAU can't be reused during calculating union values over multiple xMAUs. While with Redis HLL, once build HLL for given xMAU can be reused many times without need for redoing the same operations
- At #230438 (comment 421212723) the problem of lack control over batch size when using
BatchCount
to distinct count attributes that are not unique was uncovered. Scale of discrepancies was fully explained in that thread, and this is most extreme case already found moved here for context: