Use fixed dataset in estimate_batch_distinct_count test suite to avoid flaky test results (!51207) · Merge requests · GitLab.org / GitLab

Mikołaj Wawrzyniak requested to merge mwaw/296169-usage-data-hll-count-for-estimate_batch_distinct_count-outside-expected-error-range into master Jan 08, 2021

What does this MR do?

It aim to address problems with flakiness of estimate_batch_distinct_count method from https://gitlab.com/gitlab-org/gitlab/-/blob/master/spec/lib/gitlab/utils/usage_data_spec.rb#L61 reported at #296169 (closed)

Proposed solution focuses on avoiding occasional failures caused by outliers cases by analyzing fixed set of values. That approach will be able to assure that algorithm implementation behaves consistently in presence of changes made to the other parts of the codebase. However if there will be any changes made to algorithm implementation or algorithm parameters (eg: number used buckets are to be modified) unit test are not able to reliably analyze new algorithm implementation accuracy. In such case one should use supplementary rake tasks from !51118 (closed) and adjust unit test values.

Does this MR meet the acceptance criteria?

Conformity

Related to #296169 (closed)

Edited Jan 08, 2021 by Mikołaj Wawrzyniak

Use fixed dataset in estimate_batch_distinct_count test suite to avoid flaky test results

What does this MR do?

Does this MR meet the acceptance criteria?

Conformity

Merge request reports