Skip to content

Update formula for PG-HyperLogLog small cardinality estimates

Catalin Irimie requested to merge cat-update-pg-hll-formula into master

What does this MR do and why?

Based on the original HLL paper and referenced linear probabilistic counting paper, this should be m*log(m/V) instead of using the alpha(m) constant for small cardinality, as linear counting is used instead.

More detailed analysis in #348139 (comment 768958838)

This should also allow us to un-quarantine the test at #348139 (closed) (and hopefully make it less flaky at the current precision level?).

Screenshots or screen recordings

How to set up and validate locally

Helper scripts at https://gitlab.com/-/snippets/2220977 if needed.

MR acceptance checklist

This checklist encourages us to confirm any changes have been analyzed to reduce risks in quality, performance, reliability, security, and maintainability.

Edited by Catalin Irimie

Merge request reports