Skip to content

Optimize event counters in usage data

Alper Akgun requested to merge 208889-optimize-event-counters into master

What does this MR do?

  1. Optimizes the event counter queries for batch counting in usage data
  2. Lowers the MIN_REQUIRED_BATCH_SIZE to 1_250, this is to be able to count large tables like events because even without indexes the distinct count loops should fall below 1 second

Issue #208889 (closed)

Query

SELECT COUNT(DISTINCT "events"."author_id") FROM "events" WHERE "events"."created_at" BETWEEN '2020-02-07 06:22:09.650617' AND '2020-03-06 06:22:09.650886' AND "events"."author_id" BETWEEN 3940000 AND 3941250

Optimization

CREATE INDEX index_events_on_author_id_and_created_at ON public.events USING btree (author_id, created_at);
# The query has been executed. Duration: 48.765 min (edited) 

Before: Bad! https://explain.depesz.com/s/Xz1D

After: Good! https://explain.depesz.com/s/1yFo

Timing

events is one of our most challenging tables with > 500million records.

Before the index, the query times out and is uncountable.

After the index, very pessimistic batch counting takes 44 minutes (with database-lab numbers) - expected duration is 20 minutes or less

  • 5.5 million users,
  • with 1250 batch sizes
  • 5.5M/1250K = 4400 loops
  • Each loop takes: 600 ms ( cold cache )

Migration output

$ VERBOSE=true bundle exec rake db:migrate:up VERSION=20200306160521
== 20200306160521 AddIndexOnAuthorIdAndCreatedAtToEvents: migrating ===========
-- transaction_open?()
   -> 0.0000s
-- index_exists?(:events, [:author_id, :created_at], {:algorithm=>:concurrently})
   -> 0.0038s
-- execute("SET statement_timeout TO 0")
   -> 0.0004s
-- add_index(:events, [:author_id, :created_at], {:algorithm=>:concurrently})
   -> 0.0460s
-- execute("RESET ALL")
   -> 0.0004s
== 20200306160521 AddIndexOnAuthorIdAndCreatedAtToEvents: migrated (0.0509s) ==

$ VERBOSE=true bundle exec rake db:migrate:down VERSION=20200306160521
== 20200306160521 AddIndexOnAuthorIdAndCreatedAtToEvents: reverting ===========
-- transaction_open?()
   -> 0.0000s
-- index_exists?(:events, [:author_id, :created_at], {:algorithm=>:concurrently})
   -> 0.0036s
-- execute("SET statement_timeout TO 0")
   -> 0.0004s
-- remove_index(:events, {:algorithm=>:concurrently, :column=>[:author_id, :created_at]})
   -> 0.0214s
-- execute("RESET ALL")
   -> 0.0012s
== 20200306160521 AddIndexOnAuthorIdAndCreatedAtToEvents: reverted (0.0269s) ==
Edited by Alper Akgun

Merge request reports