Skip to content

Optimize usage ping query reading from `product_analytics_events_experimental` table

Alishan Ladhani requested to merge ali/optimize-snowplow-usage-ping-query into master

What does this MR do?

Optimize usage ping query reading from product_analytics_events_experimental table. By iterating over collector_tstamp, we are better able to leverage the existing multicolumn index on (project_id, collector_tstamp).

Since we are iterating over a timestamp column, we do not control the number of records in a batch. The batch size is a time span in which to query. BatchCounter starts with a default batch size of 100_000 seconds, which equals ~28 hours. The smallest batch size is 1_250 seconds, which is ~21 minutes.

Before

Min (Time: 19.748 ms):

SELECT MIN("product_analytics_events_experimental"."id") FROM "product_analytics_events_experimental" WHERE "product_analytics_events_experimental"."project_id" = 22 AND "product_analytics_events_experimental"."se_category" = 'epics' AND "product_analytics_events_experimental"."se_action" = 'promote' AND "product_analytics_events_experimental"."collector_tstamp" BETWEEN '2020-10-31 22:07:31.668227' AND '2020-11-28 22:07:31.668301'

https://explain.depesz.com/s/ZgOW

Max (Time: 16.997 ms):

SELECT MAX("product_analytics_events_experimental"."id") FROM "product_analytics_events_experimental" WHERE "product_analytics_events_experimental"."project_id" = 22 AND "product_analytics_events_experimental"."se_category" = 'epics' AND "product_analytics_events_experimental"."se_action" = 'promote' AND "product_analytics_events_experimental"."collector_tstamp" BETWEEN '2020-10-31 22:07:31.668227' AND '2020-11-28 22:07:31.668301'

https://explain.depesz.com/s/fLvb

Batch count (Time: 7.467 ms):

SELECT COUNT("product_analytics_events_experimental"."id") FROM "product_analytics_events_experimental" WHERE "product_analytics_events_experimental"."project_id" = 22 AND "product_analytics_events_experimental"."se_category" = 'epics' AND "product_analytics_events_experimental"."se_action" = 'promote' AND (collector_tstamp BETWEEN '2020-07-25' AND '2020-08-22' ) AND "product_analytics_events_experimental"."id" BETWEEN 10 AND 100009

https://explain.depesz.com/s/PMGD

After

Min and max are determined from the range being queried.

Batch queries: (First query took 11.928 ms)

SELECT COUNT("product_analytics_events_experimental"."collector_tstamp") FROM "product_analytics_events_experimental" WHERE "product_analytics_events_experimental"."project_id" = 22 AND "product_analytics_events_experimental"."se_category" = 'epics' AND "product_analytics_events_experimental"."se_action" = 'promote' AND "product_analytics_events_experimental"."collector_tstamp" >= '2020-10-31 21:23:30.109022' AND "product_analytics_events_experimental"."collector_tstamp" < '2020-11-02 01:10:10.109022'
SELECT COUNT("product_analytics_events_experimental"."collector_tstamp") FROM "product_analytics_events_experimental" WHERE "product_analytics_events_experimental"."project_id" = 22 AND "product_analytics_events_experimental"."se_category" = 'epics' AND "product_analytics_events_experimental"."se_action" = 'promote' AND "product_analytics_events_experimental"."collector_tstamp" >= '2020-11-02 01:10:10.109022' AND "product_analytics_events_experimental"."collector_tstamp" < '2020-11-03 04:56:50.109022'
SELECT COUNT("product_analytics_events_experimental"."collector_tstamp") FROM "product_analytics_events_experimental" WHERE "product_analytics_events_experimental"."project_id" = 22 AND "product_analytics_events_experimental"."se_category" = 'epics' AND "product_analytics_events_experimental"."se_action" = 'promote' AND "product_analytics_events_experimental"."collector_tstamp" >= '2020-11-03 04:56:50.109022' AND "product_analytics_events_experimental"."collector_tstamp" < '2020-11-04 08:43:30.109022'
SELECT COUNT("product_analytics_events_experimental"."collector_tstamp") FROM "product_analytics_events_experimental" WHERE "product_analytics_events_experimental"."project_id" = 22 AND "product_analytics_events_experimental"."se_category" = 'epics' AND "product_analytics_events_experimental"."se_action" = 'promote' AND "product_analytics_events_experimental"."collector_tstamp" >= '2020-11-04 08:43:30.109022' AND "product_analytics_events_experimental"."collector_tstamp" < '2020-11-05 12:30:10.109022'
SELECT COUNT("product_analytics_events_experimental"."collector_tstamp") FROM "product_analytics_events_experimental" WHERE "product_analytics_events_experimental"."project_id" = 22 AND "product_analytics_events_experimental"."se_category" = 'epics' AND "product_analytics_events_experimental"."se_action" = 'promote' AND "product_analytics_events_experimental"."collector_tstamp" >= '2020-11-05 12:30:10.109022' AND "product_analytics_events_experimental"."collector_tstamp" < '2020-11-06 16:16:50.109022'
SELECT COUNT("product_analytics_events_experimental"."collector_tstamp") FROM "product_analytics_events_experimental" WHERE "product_analytics_events_experimental"."project_id" = 22 AND "product_analytics_events_experimental"."se_category" = 'epics' AND "product_analytics_events_experimental"."se_action" = 'promote' AND "product_analytics_events_experimental"."collector_tstamp" >= '2020-11-06 16:16:50.109022' AND "product_analytics_events_experimental"."collector_tstamp" < '2020-11-07 20:03:30.109022'
SELECT COUNT("product_analytics_events_experimental"."collector_tstamp") FROM "product_analytics_events_experimental" WHERE "product_analytics_events_experimental"."project_id" = 22 AND "product_analytics_events_experimental"."se_category" = 'epics' AND "product_analytics_events_experimental"."se_action" = 'promote' AND "product_analytics_events_experimental"."collector_tstamp" >= '2020-11-07 20:03:30.109022' AND "product_analytics_events_experimental"."collector_tstamp" < '2020-11-08 23:50:10.109022'
SELECT COUNT("product_analytics_events_experimental"."collector_tstamp") FROM "product_analytics_events_experimental" WHERE "product_analytics_events_experimental"."project_id" = 22 AND "product_analytics_events_experimental"."se_category" = 'epics' AND "product_analytics_events_experimental"."se_action" = 'promote' AND "product_analytics_events_experimental"."collector_tstamp" >= '2020-11-08 23:50:10.109022' AND "product_analytics_events_experimental"."collector_tstamp" < '2020-11-10 03:36:50.109022'
SELECT COUNT("product_analytics_events_experimental"."collector_tstamp") FROM "product_analytics_events_experimental" WHERE "product_analytics_events_experimental"."project_id" = 22 AND "product_analytics_events_experimental"."se_category" = 'epics' AND "product_analytics_events_experimental"."se_action" = 'promote' AND "product_analytics_events_experimental"."collector_tstamp" >= '2020-11-10 03:36:50.109022' AND "product_analytics_events_experimental"."collector_tstamp" < '2020-11-11 07:23:30.109022'
SELECT COUNT("product_analytics_events_experimental"."collector_tstamp") FROM "product_analytics_events_experimental" WHERE "product_analytics_events_experimental"."project_id" = 22 AND "product_analytics_events_experimental"."se_category" = 'epics' AND "product_analytics_events_experimental"."se_action" = 'promote' AND "product_analytics_events_experimental"."collector_tstamp" >= '2020-11-11 07:23:30.109022' AND "product_analytics_events_experimental"."collector_tstamp" < '2020-11-12 11:10:10.109022'
SELECT COUNT("product_analytics_events_experimental"."collector_tstamp") FROM "product_analytics_events_experimental" WHERE "product_analytics_events_experimental"."project_id" = 22 AND "product_analytics_events_experimental"."se_category" = 'epics' AND "product_analytics_events_experimental"."se_action" = 'promote' AND "product_analytics_events_experimental"."collector_tstamp" >= '2020-11-12 11:10:10.109022' AND "product_analytics_events_experimental"."collector_tstamp" < '2020-11-13 14:56:50.109022'
SELECT COUNT("product_analytics_events_experimental"."collector_tstamp") FROM "product_analytics_events_experimental" WHERE "product_analytics_events_experimental"."project_id" = 22 AND "product_analytics_events_experimental"."se_category" = 'epics' AND "product_analytics_events_experimental"."se_action" = 'promote' AND "product_analytics_events_experimental"."collector_tstamp" >= '2020-11-13 14:56:50.109022' AND "product_analytics_events_experimental"."collector_tstamp" < '2020-11-14 18:43:30.109022'
SELECT COUNT("product_analytics_events_experimental"."collector_tstamp") FROM "product_analytics_events_experimental" WHERE "product_analytics_events_experimental"."project_id" = 22 AND "product_analytics_events_experimental"."se_category" = 'epics' AND "product_analytics_events_experimental"."se_action" = 'promote' AND "product_analytics_events_experimental"."collector_tstamp" >= '2020-11-14 18:43:30.109022' AND "product_analytics_events_experimental"."collector_tstamp" < '2020-11-15 22:30:10.109022'
SELECT COUNT("product_analytics_events_experimental"."collector_tstamp") FROM "product_analytics_events_experimental" WHERE "product_analytics_events_experimental"."project_id" = 22 AND "product_analytics_events_experimental"."se_category" = 'epics' AND "product_analytics_events_experimental"."se_action" = 'promote' AND "product_analytics_events_experimental"."collector_tstamp" >= '2020-11-15 22:30:10.109022' AND "product_analytics_events_experimental"."collector_tstamp" < '2020-11-17 02:16:50.109022'
SELECT COUNT("product_analytics_events_experimental"."collector_tstamp") FROM "product_analytics_events_experimental" WHERE "product_analytics_events_experimental"."project_id" = 22 AND "product_analytics_events_experimental"."se_category" = 'epics' AND "product_analytics_events_experimental"."se_action" = 'promote' AND "product_analytics_events_experimental"."collector_tstamp" >= '2020-11-17 02:16:50.109022' AND "product_analytics_events_experimental"."collector_tstamp" < '2020-11-18 06:03:30.109022'
SELECT COUNT("product_analytics_events_experimental"."collector_tstamp") FROM "product_analytics_events_experimental" WHERE "product_analytics_events_experimental"."project_id" = 22 AND "product_analytics_events_experimental"."se_category" = 'epics' AND "product_analytics_events_experimental"."se_action" = 'promote' AND "product_analytics_events_experimental"."collector_tstamp" >= '2020-11-18 06:03:30.109022' AND "product_analytics_events_experimental"."collector_tstamp" < '2020-11-19 09:50:10.109022'
SELECT COUNT("product_analytics_events_experimental"."collector_tstamp") FROM "product_analytics_events_experimental" WHERE "product_analytics_events_experimental"."project_id" = 22 AND "product_analytics_events_experimental"."se_category" = 'epics' AND "product_analytics_events_experimental"."se_action" = 'promote' AND "product_analytics_events_experimental"."collector_tstamp" >= '2020-11-19 09:50:10.109022' AND "product_analytics_events_experimental"."collector_tstamp" < '2020-11-20 13:36:50.109022'
SELECT COUNT("product_analytics_events_experimental"."collector_tstamp") FROM "product_analytics_events_experimental" WHERE "product_analytics_events_experimental"."project_id" = 22 AND "product_analytics_events_experimental"."se_category" = 'epics' AND "product_analytics_events_experimental"."se_action" = 'promote' AND "product_analytics_events_experimental"."collector_tstamp" >= '2020-11-20 13:36:50.109022' AND "product_analytics_events_experimental"."collector_tstamp" < '2020-11-21 17:23:30.109022'
SELECT COUNT("product_analytics_events_experimental"."collector_tstamp") FROM "product_analytics_events_experimental" WHERE "product_analytics_events_experimental"."project_id" = 22 AND "product_analytics_events_experimental"."se_category" = 'epics' AND "product_analytics_events_experimental"."se_action" = 'promote' AND "product_analytics_events_experimental"."collector_tstamp" >= '2020-11-21 17:23:30.109022' AND "product_analytics_events_experimental"."collector_tstamp" < '2020-11-22 21:10:10.109022'
SELECT COUNT("product_analytics_events_experimental"."collector_tstamp") FROM "product_analytics_events_experimental" WHERE "product_analytics_events_experimental"."project_id" = 22 AND "product_analytics_events_experimental"."se_category" = 'epics' AND "product_analytics_events_experimental"."se_action" = 'promote' AND "product_analytics_events_experimental"."collector_tstamp" >= '2020-11-22 21:10:10.109022' AND "product_analytics_events_experimental"."collector_tstamp" < '2020-11-24 00:56:50.109022'
SELECT COUNT("product_analytics_events_experimental"."collector_tstamp") FROM "product_analytics_events_experimental" WHERE "product_analytics_events_experimental"."project_id" = 22 AND "product_analytics_events_experimental"."se_category" = 'epics' AND "product_analytics_events_experimental"."se_action" = 'promote' AND "product_analytics_events_experimental"."collector_tstamp" >= '2020-11-24 00:56:50.109022' AND "product_analytics_events_experimental"."collector_tstamp" < '2020-11-25 04:43:30.109022'
SELECT COUNT("product_analytics_events_experimental"."collector_tstamp") FROM "product_analytics_events_experimental" WHERE "product_analytics_events_experimental"."project_id" = 22 AND "product_analytics_events_experimental"."se_category" = 'epics' AND "product_analytics_events_experimental"."se_action" = 'promote' AND "product_analytics_events_experimental"."collector_tstamp" >= '2020-11-25 04:43:30.109022' AND "product_analytics_events_experimental"."collector_tstamp" < '2020-11-26 08:30:10.109022'
SELECT COUNT("product_analytics_events_experimental"."collector_tstamp") FROM "product_analytics_events_experimental" WHERE "product_analytics_events_experimental"."project_id" = 22 AND "product_analytics_events_experimental"."se_category" = 'epics' AND "product_analytics_events_experimental"."se_action" = 'promote' AND "product_analytics_events_experimental"."collector_tstamp" >= '2020-11-26 08:30:10.109022' AND "product_analytics_events_experimental"."collector_tstamp" < '2020-11-27 12:16:50.109022'
SELECT COUNT("product_analytics_events_experimental"."collector_tstamp") FROM "product_analytics_events_experimental" WHERE "product_analytics_events_experimental"."project_id" = 22 AND "product_analytics_events_experimental"."se_category" = 'epics' AND "product_analytics_events_experimental"."se_action" = 'promote' AND "product_analytics_events_experimental"."collector_tstamp" >= '2020-11-27 12:16:50.109022' AND "product_analytics_events_experimental"."collector_tstamp" < '2020-11-28 16:03:30.109022'
SELECT COUNT("product_analytics_events_experimental"."collector_tstamp") FROM "product_analytics_events_experimental" WHERE "product_analytics_events_experimental"."project_id" = 22 AND "product_analytics_events_experimental"."se_category" = 'epics' AND "product_analytics_events_experimental"."se_action" = 'promote' AND "product_analytics_events_experimental"."collector_tstamp" >= '2020-11-28 16:03:30.109022' AND "product_analytics_events_experimental"."collector_tstamp" < '2020-11-28 21:23:31.109185'

https://explain.depesz.com/s/sy42

The improvement here is that the new query only filters on se_category and se_action, whereas the old one also filters on id.

Related to #241271 (closed), #262101 (closed)

Does this MR meet the acceptance criteria?

Conformity

Availability and Testing

Security

If this MR contains changes to processing or storing of credentials or tokens, authorization and authentication methods and other items described in the security review guidelines:

  • [-] Label as security and @ mention @gitlab-com/gl-security/appsec
  • [-] The MR includes necessary changes to maintain consistency between UI, API, email, or other methods
  • [-] Security reports checked/validated by a reviewer from the AppSec team
Edited by Alishan Ladhani

Merge request reports