Implement utility functions for loose index scan
What does this MR do and why?
This MR implements a solution to address the distinct counting (and iteration) problems on unbounded scopes (no time range filter). The method (if used correctly) will a read maximum DISTINCT_ITEMS_IN_COLUMN
number of rows.
Example 1: distinct authors in the events
table. How many users generated events? (usage ping)
count = 0
Event.include(EachBatch)
Event.distinct_each_batch(column: 'author_id', of: 10000) { |r| count += r.count(:author_id) }
The snippet above took 23 minutes on db-lab.
Example 2: Pulling the first 100 authors
Issue.include(LooseIndexScan)
Issue.loose_index_scan(column: :author_id).limit(100)
Example 3: Distinct project_id
in Ci::Pipeline
- Plan with
DISTINCT
: https://explain.depesz.com/s/YiaJ - Plan with loose index scan: https://explain.depesz.com/s/fgku
MR acceptance checklist
This checklist encourages us to confirm any changes have been analyzed to reduce risks in quality, performance, reliability, security, and maintainability.
-
I have evaluated the MR acceptance checklist for this MR.
Edited by Adam Hegyi