You need to sign in or sign up before continuing.
Optimize event counters in usage data
What does this MR do?
- Optimizes the
event
counter queries for batch counting in usage data - Lowers the
MIN_REQUIRED_BATCH_SIZE
to1_250
, this is to be able to count large tables likeevents
because even without indexes the distinct count loops should fall below1 second
Issue #208889 (closed)
Query
SELECT COUNT(DISTINCT "events"."author_id") FROM "events" WHERE "events"."created_at" BETWEEN '2020-02-07 06:22:09.650617' AND '2020-03-06 06:22:09.650886' AND "events"."author_id" BETWEEN 3940000 AND 3941250
Optimization
CREATE INDEX index_events_on_author_id_and_created_at ON public.events USING btree (author_id, created_at);
# The query has been executed. Duration: 48.765 min (edited)
https://explain.depesz.com/s/Xz1D
Before: Bad!https://explain.depesz.com/s/1yFo
After: Good!Timing
events
is one of our most challenging tables with > 500million records.
Before the index, the query times out and is uncountable.
After the index, very pessimistic batch counting takes 44 minutes (with database-lab numbers) - expected duration is 20 minutes or less
- 5.5 million users,
- with 1250 batch sizes
- 5.5M/1250K = 4400 loops
- Each loop takes: 600 ms ( cold cache )
Migration output
$ VERBOSE=true bundle exec rake db:migrate:up VERSION=20200306160521
== 20200306160521 AddIndexOnAuthorIdAndCreatedAtToEvents: migrating ===========
-- transaction_open?()
-> 0.0000s
-- index_exists?(:events, [:author_id, :created_at], {:algorithm=>:concurrently})
-> 0.0038s
-- execute("SET statement_timeout TO 0")
-> 0.0004s
-- add_index(:events, [:author_id, :created_at], {:algorithm=>:concurrently})
-> 0.0460s
-- execute("RESET ALL")
-> 0.0004s
== 20200306160521 AddIndexOnAuthorIdAndCreatedAtToEvents: migrated (0.0509s) ==
$ VERBOSE=true bundle exec rake db:migrate:down VERSION=20200306160521
== 20200306160521 AddIndexOnAuthorIdAndCreatedAtToEvents: reverting ===========
-- transaction_open?()
-> 0.0000s
-- index_exists?(:events, [:author_id, :created_at], {:algorithm=>:concurrently})
-> 0.0036s
-- execute("SET statement_timeout TO 0")
-> 0.0004s
-- remove_index(:events, {:algorithm=>:concurrently, :column=>[:author_id, :created_at]})
-> 0.0214s
-- execute("RESET ALL")
-> 0.0012s
== 20200306160521 AddIndexOnAuthorIdAndCreatedAtToEvents: reverted (0.0269s) ==
Edited by Alper Akgun