Skip to content

Implement utility functions for loose index scan

Adam Hegyi requested to merge distinct_each_batch into master

What does this MR do and why?

This MR implements a solution to address the distinct counting (and iteration) problems on unbounded scopes (no time range filter). The method (if used correctly) will a read maximum DISTINCT_ITEMS_IN_COLUMN number of rows.

Example 1: distinct authors in the events table. How many users generated events? (usage ping)

count = 0
Event.include(EachBatch)
Event.distinct_each_batch(column: 'author_id', of: 10000) { |r| count += r.count(:author_id) }

The snippet above took 23 minutes on db-lab.

Example 2: Pulling the first 100 authors

Issue.include(LooseIndexScan)
Issue.loose_index_scan(column: :author_id).limit(100)

Example 3: Distinct project_id in Ci::Pipeline

MR acceptance checklist

This checklist encourages us to confirm any changes have been analyzed to reduce risks in quality, performance, reliability, security, and maintainability.

Edited by Adam Hegyi

Merge request reports