Optimize event counters in usage data
Fix the queries for both monthly, and all-times version of the queries with batch counters enabled.
Console work to fetch the queries, # reset to master so no need for # Feature.enable(:usage_ping_batch_counter)
# 1st Query
time_period = {}
Gitlab::UsageData.distinct_count(::Event.where(time_period), :author_id)
SELECT MIN("events"."author_id") FROM "events"
explain SELECT MAX("events"."author_id") FROM "events"
Good! https://explain.depesz.com/s/5ILq
# 2000 is the minimal batch
explain SELECT COUNT(DISTINCT "events"."author_id") FROM "events" WHERE "events"."author_id" BETWEEN 910000 AND 912000
Bad! https://explain.depesz.com/s/nF9c
# 2nd query
time_period = { created_at: 28.days.ago..Time.current }
Gitlab::UsageData.distinct_count(::Event.where(time_period), :author_id)
SELECT MIN("events"."author_id") FROM "events" WHERE "events"."created_at" BETWEEN '2020-02-07 06:22:09.650617' AND '2020-03-06 06:22:09.650886'
explain SELECT MAX("events"."author_id") FROM "events" WHERE "events"."created_at" BETWEEN '2020-02-07 06:22:09.650617' AND '2020-03-06 06:22:09.650886'
Good! https://explain.depesz.com/s/WVfI
explain SELECT COUNT(DISTINCT "events"."author_id") FROM "events" WHERE "events"."created_at" BETWEEN '2020-02-07 06:22:09.650617' AND '2020-03-06 06:22:09.650886' AND "events"."author_id" BETWEEN 610000 AND 612000
Very Bad! https://explain.depesz.com/s/O1ah
Guide to optimize the counter
- Follow database query optimization guides https://docs.gitlab.com/ee/development/database_review.html#preparation-when-adding-or-modifying-queries
- Use your rails console to find out the 2 types of queries with
Feature.enable(:usage_ping_batch_counter)
-
- Optimize for the range query like
id between
1 AND 10000and
user_id between1 AND 100_000
- Optimize for the range query like
-
- Check also for the min and max query
- Optional: Optimize if possible also related counters which are for the same table or create indexes that cover them too
Edited by Alper Akgun