Skip to content

Optimize event counters in usage data

Fix the queries for both monthly, and all-times version of the queries with batch counters enabled.

Console work to fetch the queries, # reset to master so no need for # Feature.enable(:usage_ping_batch_counter)


# 1st Query
time_period = {}
Gitlab::UsageData.distinct_count(::Event.where(time_period), :author_id)

SELECT MIN("events"."author_id") FROM "events"
explain SELECT MAX("events"."author_id") FROM "events"
Good! https://explain.depesz.com/s/5ILq


# 2000 is the minimal batch 
explain SELECT COUNT(DISTINCT "events"."author_id") FROM "events" WHERE "events"."author_id" BETWEEN 910000 AND 912000
Bad! https://explain.depesz.com/s/nF9c


# 2nd query
time_period = { created_at: 28.days.ago..Time.current }
Gitlab::UsageData.distinct_count(::Event.where(time_period), :author_id)



SELECT MIN("events"."author_id") FROM "events" WHERE "events"."created_at" BETWEEN '2020-02-07 06:22:09.650617' AND '2020-03-06 06:22:09.650886'
explain SELECT MAX("events"."author_id") FROM "events" WHERE "events"."created_at" BETWEEN '2020-02-07 06:22:09.650617' AND '2020-03-06 06:22:09.650886'

Good! https://explain.depesz.com/s/WVfI

explain SELECT COUNT(DISTINCT "events"."author_id") FROM "events" WHERE "events"."created_at" BETWEEN '2020-02-07 06:22:09.650617' AND '2020-03-06 06:22:09.650886' AND "events"."author_id"  BETWEEN 610000 AND 612000
Very Bad!  https://explain.depesz.com/s/O1ah

Guide to optimize the counter

Edited by Alper Akgun