Approximate table counts based on TABLESAMPLE (!22650) · Merge requests · GitLab.org / GitLab FOSS

Andreas Brandl requested to merge ab-approximate-counts into master Oct 28, 2018

What does this MR do?

Refactor the existing approximate count strategies towards a Strategy pattern
Implement a approximate count strategy for PostgreSQL using TABLESAMPLE (behind feature flag)

The goal here is to fix the admin dashboard (which is currently the only user of approximate counts AFAIS).

For PostgreSQL, the order of strategies is:

Reltuples
Tablesample
Exact count

The Tablesample strategy works like this:

Use reltuples statistics for an estimate E (irrespective of how the statistic is)
If E < EXACT_COUNT_THRESHOLD, perform an exact count
Else count a sample of roughly TABLESAMPLE_ROW_TARGET rows and interpolate to estimate full relation size.

The idea is that in both cases, we control how many rows are being used to estimate the relation size. In the tablesample case, we use E to calculate the portion of the table to count (TABLESAMPLE is based on a percentage of the table).

The unsolved issue here is that in case the reltuples statistic is way off reality, we may end up trying to count a large portion of the table. For example, if reltuples claims the table has only one record, we would perform an exact count. If in reality the table has millions of records, we're out of luck. In that sense, this change is an optimization of the current situation where we are out-of-luck most of the time (due to statistics being too old).

Note that the tablesample strategy feature can be toggled with tablesample_counts.

What are the relevant issue numbers?

https://gitlab.com/gitlab-org/gitlab-ce/issues/54116

Does this MR meet the acceptance criteria?

Changelog entry added, if necessary
Documentation created/updated
Tests added for this feature/bug
Conforms to the code review guidelines
Conforms to the merge request performance guidelines
Conforms to the style guides
Conforms to the database guides
Link to e2e tests MR added if this MR has Requires e2e tests label. See the Test Planning Process.

Edited Nov 27, 2018 by Andreas Brandl

Approximate table counts based on TABLESAMPLE

What does this MR do?

What are the relevant issue numbers?

Does this MR meet the acceptance criteria?

Merge request reports