Skip to content

Aggregate package events in usage ping

Giorgenes Gelatti requested to merge 205578-aggregate-package-events into master

What does this MR do?

Aggregates package events data (Packages::Event model) into usage data, returning a dictionary by event type, package type and authentication method (user vs deploy token vs guest).

The output will look something like:

{
  "by_event_type": {
    "user": {
       "push_package": {
          "npm": 5
          "composer": 10
       }
    }
  }
}

SQL Queries and Plans

SELECT MIN("packages_events"."originator") 
FROM "packages_events" 
WHERE "packages_events"."created_at" BETWEEN '2020-09-13' AND '2020-10-11' 
AND "packages_events"."originator_type" != 2;
Aggregate  (cost=1115538.16..1115538.17 rows=1 width=8) (actual time=10480.099..10480.101 rows=1 loops=1)
   Buffers: shared hit=96 read=174434
   I/O Timings: read=3735.356
   ->  Seq Scan on public.packages_events  (cost=0.00..1093066.60 rows=8988623 width=8) (actual time=0.206..9032.969 rows=9240404 loops=1)
         Filter: ((packages_events.created_at >= '2020-09-13 00:00:00+00'::timestamp with time zone) AND (packages_events.created_at <= '2020-10-11 00:00:00+00'::timestamp with time zone) AND (packages_events.originator_type <> 2))
         Rows Removed by Filter: 13929343
         Buffers: shared hit=96 read=174434
         I/O Timings: read=3735.356

Time: 10.481 s
  - planning: 0.379 ms
  - execution: 10.480 s
    - I/O read: 3.735 s
    - I/O write: 0.000 ms

Shared buffers:
  - hits: 96 (~768.00 KiB) from the buffer pool
  - reads: 174434 (~1.30 GiB) from the OS file cache, including disk I/O
  - dirtied: 0
  - writes: 0
SELECT MAX("packages_events"."originator") 
FROM "packages_events" 
WHERE "packages_events"."created_at" BETWEEN '2020-09-13' AND '2020-10-11' 
AND "packages_events"."originator_type" != 2;
 Result  (cost=0.63..0.64 rows=1 width=8) (actual time=2.930..2.931 rows=1 loops=1)
   Buffers: shared hit=1 read=88
   I/O Timings: read=2.051
   InitPlan 1 (returns $0)
     ->  Limit  (cost=0.56..0.63 rows=1 width=8) (actual time=2.927..2.927 rows=1 loops=1)
           Buffers: shared hit=1 read=88
           I/O Timings: read=2.051
           ->  Index Only Scan using index_packages_events_on_created_and_types_and_originator on public.packages_events  (cost=0.56..550151.93 rows=8668268 width=8) (actual time=2.926..2.926 rows=1 loops=1)
                 Index Cond: ((packages_events.originator IS NOT NULL) AND (packages_events.created_at >= '2020-09-13 00:00:00+00'::timestamp with time zone) AND (packages_events.created_at <= '2020-10-11 00:00:00+00'::timestamp with time zone))
                 Heap Fetches: 0
                 Filter: (packages_events.originator_type <> 2)
                 Rows Removed by Filter: 0
                 Buffers: shared hit=1 read=88
                 I/O Timings: read=2.051

Time: 3.171 ms
  - planning: 0.212 ms
  - execution: 2.959 ms
    - I/O read: 2.051 ms
    - I/O write: 0.000 ms

Shared buffers:
  - hits: 1 (~8.00 KiB) from the buffer pool
  - reads: 88 (~704.00 KiB) from the OS file cache, including disk I/O
  - dirtied: 0
  - writes: 0
SELECT COUNT(DISTINCT "packages_events"."originator") AS count_originator, 
"packages_events"."originator_type" AS packages_events_originator_type, 
"packages_events"."event_type" AS packages_events_event_type, 
"packages_events"."event_scope" AS packages_events_event_scope 
FROM "packages_events" 
WHERE "packages_events"."created_at" BETWEEN '2020-09-13' AND '2020-10-11' 
AND "packages_events"."originator_type" != 2 
AND "packages_events"."originator" BETWEEN 0 AND 9999 
GROUP BY "packages_events"."originator_type", "packages_events"."event_type", "packages_events"."event_scope";
 Aggregate  (cost=959.61..968.85 rows=217 width=14) (actual time=21.600..21.826 rows=10 loops=1)
   Group Key: packages_events.originator_type, packages_events.event_type, packages_events.event_scope
   Buffers: shared hit=35 read=462
   I/O Timings: read=19.120
   ->  Sort  (cost=959.61..961.02 rows=566 width=14) (actual time=21.529..21.568 rows=512 loops=1)
         Sort Key: packages_events.originator_type, packages_events.event_type, packages_events.event_scope
         Sort Method: quicksort  Memory: 49kB
         Buffers: shared hit=29 read=462
         I/O Timings: read=19.120
         ->  Index Scan using index_packages_events_on_created_and_types_and_originator on public.packages_events  (cost=0.56..933.73 rows=566 width=14) (actual time=0.295..21.152 rows=512 loops=1)
               Index Cond: ((packages_events.originator >= 0) AND (packages_events.originator <= 9999) AND (packages_events.created_at >= '2020-09-13 00:00:00+00'::timestamp with time zone) AND (packages_events.created_at <= '2020-10-11 00:00:00+00'::timestamp with time zone))
               Filter: (packages_events.originator_type <> 2)
               Rows Removed by Filter: 0
               Buffers: shared hit=29 read=462

Time: 22.274 ms
  - planning: 0.399 ms
  - execution: 21.875 ms
    - I/O read: 19.120 ms
    - I/O write: 0.000 ms

Shared buffers:
  - hits: 35 (~280.00 KiB) from the buffer pool
  - reads: 462 (~3.60 MiB) from the OS file cache, including disk I/O
  - dirtied: 0
  - writes: 0

SELECT MIN("packages_events"."id") 
FROM "packages_events" WHERE "packages_events"."created_at" BETWEEN '2020-09-13' AND '2020-10-11' 
AND "packages_events"."originator_type" = 2;
 Result  (cost=2.70..2.71 rows=1 width=8) (actual time=2.811..2.812 rows=1 loops=1)
   Buffers: shared hit=1 read=3
   I/O Timings: read=2.760
   InitPlan 1 (returns $0)
     ->  Limit  (cost=0.44..2.70 rows=1 width=8) (actual time=2.806..2.807 rows=1 loops=1)
           Buffers: shared hit=1 read=3
           I/O Timings: read=2.760
           ->  Index Scan using packages_events_pkey on public.packages_events  (cost=0.44..1347297.70 rows=596032 width=8) (actual time=2.805..2.805 rows=1 loops=1)
                 Index Cond: (packages_events.id IS NOT NULL)
                 Filter: ((packages_events.created_at >= '2020-09-13 00:00:00+00'::timestamp with time zone) AND (packages_events.created_at <= '2020-10-11 00:00:00+00'::timestamp with time zone) AND (packages_events.originator_type = 2))
                 Rows Removed by Filter: 8
                 Buffers: shared hit=1 read=3
                 I/O Timings: read=2.760

Time: 3.088 ms
  - planning: 0.243 ms
  - execution: 2.845 ms
    - I/O read: 2.760 ms
    - I/O write: 0.000 ms

Shared buffers:
  - hits: 1 (~8.00 KiB) from the buffer pool
  - reads: 3 (~24.00 KiB) from the OS file cache, including disk I/O
  - dirtied: 0
  - writes: 0

SELECT MAX("packages_events"."id") 
FROM "packages_events" 
WHERE "packages_events"."created_at" BETWEEN '2020-09-13' AND '2020-10-11' 
AND "packages_events"."originator_type" = 2;
 Result  (cost=2.70..2.71 rows=1 width=8) (actual time=54773.850..54773.852 rows=1 loops=1)
   Buffers: shared hit=12598754 read=144500
   I/O Timings: read=45233.743
   InitPlan 1 (returns $0)
     ->  Limit  (cost=0.44..2.70 rows=1 width=8) (actual time=54773.842..54773.843 rows=1 loops=1)
           Buffers: shared hit=12598754 read=144500
           I/O Timings: read=45233.743
           ->  Index Scan using packages_events_pkey on public.packages_events  (cost=0.44..1347297.70 rows=596032 width=8) (actual time=54773.840..54773.840 rows=1 loops=1)
                 Index Cond: (packages_events.id IS NOT NULL)
                 Filter: ((packages_events.created_at >= '2020-09-13 00:00:00+00'::timestamp with time zone) AND (packages_events.created_at <= '2020-10-11 00:00:00+00'::timestamp with time zone) AND (packages_events.originator_type = 2))
                 Rows Removed by Filter: 13404640
                 Buffers: shared hit=12598754 read=144500
                 I/O Timings: read=45233.743

Time: 54.774 s
  - planning: 0.165 ms
  - execution: 54.774 s
    - I/O read: 45.234 s
    - I/O write: 0.000 ms

Shared buffers:
  - hits: 12598754 (~96.10 GiB) from the buffer pool
  - reads: 144500 (~1.10 GiB) from the OS file cache, including disk I/O
  - dirtied: 0
  - writes: 0
SELECT COUNT("packages_events"."id") AS count_id, "packages_events"."originator_type" AS packages_events_originator_type, "packages_events"."event_type" AS packages_events_event_type, "packages_events"."event_scope" AS packages_events_event_scope FROM "packages_events" 
WHERE "packages_events"."created_at" BETWEEN '2020-09-13' AND '2020-10-11' 
AND "packages_events"."originator_type" = 2 
AND "packages_events"."id" >= 0 AND "packages_events"."id" < 100000 
GROUP BY "packages_events"."originator_type", "packages_events"."event_type", "packages_events"."event_scope";
 HashAggregate  (cost=12981.49..12983.89 rows=240 width=14) (actual time=560.704..560.712 rows=9 loops=1)
   Group Key: packages_events.originator_type, packages_events.event_type, packages_events.event_scope
   Buffers: shared hit=74066 read=5709
   I/O Timings: read=483.073
   ->  Index Scan using packages_events_pkey on public.packages_events  (cost=0.44..12955.36 rows=2613 width=14) (actual time=0.034..557.366 rows=4627 loops=1)
         Index Cond: ((packages_events.id >= 0) AND (packages_events.id < 100000))
         Filter: ((packages_events.created_at >= '2020-09-13 00:00:00+00'::timestamp with time zone) AND (packages_events.created_at <= '2020-10-11 00:00:00+00'::timestamp with time zone) AND (packages_events.originator_type = 2))
         Rows Removed by Filter: 95372
         Buffers: shared hit=74066 read=5709


Time: 561.136 ms
  - planning: 0.352 ms
  - execution: 560.784 ms
    - I/O read: 483.073 ms
    - I/O write: 0.000 ms

Shared buffers:
  - hits: 74066 (~578.60 MiB) from the buffer pool
  - reads: 5709 (~44.60 MiB) from the OS file cache, including disk I/O
  - dirtied: 0
  - writes: 0

Migrations

Down

== 20201013043233 AddIndexToPackageEvents: reverting ==========================
-- transaction_open?()
   -> 0.0000s
-- indexes(:packages_events)
   -> 0.0044s
-- execute("SET statement_timeout TO 0")
   -> 0.0003s
-- remove_index(:packages_events, {:algorithm=>:concurrently, :name=>"index_packages_events_on_created_and_types_and_originator"})
   -> 0.0113s
-- execute("RESET ALL")
   -> 0.0003s

Up

== 20201013043233 AddIndexToPackageEvents: migrating ==========================
-- transaction_open?()
   -> 0.0000s
-- index_exists?(:packages_events, [:originator, :originator_type, :event_type, :created_at], {:name=>"index_packages_events_on_created_and_types_and_originator", :algorithm=>:concurrently})
   -> 0.0040s
-- execute("SET statement_timeout TO 0")
   -> 0.0004s
-- add_index(:packages_events, [:originator, :originator_type, :event_type, :created_at], {:name=>"index_packages_events_on_created_and_types_and_originator", :algorithm=>:concurrently})
   -> 0.0324s
-- execute("RESET ALL")
   -> 0.0004s
== 20201013043233 AddIndexToPackageEvents: migrated (0.0381s) =================

Does this MR meet the acceptance criteria?

Conformity

Availability and Testing

Security

If this MR contains changes to processing or storing of credentials or tokens, authorization and authentication methods and other items described in the security review guidelines:

  • Label as security and @ mention @gitlab-com/gl-security/appsec
  • The MR includes necessary changes to maintain consistency between UI, API, email, or other methods
  • Security reports checked/validated by a reviewer from the AppSec team

Closes #205578 (closed)

Edited by Giorgenes Gelatti

Merge request reports