Adds rolling package count to usage data (!40452) · Merge requests · GitLab.org / GitLab

Giorgenes Gelatti requested to merge 205578-add-package-count-to-usage-data-28days into master Aug 25, 2020

What does this MR do?

Adds rolling package count support to usage data for usage ping.

Migrations

== 20200901203055 AddIdCreatedAtIndexToPackages: migrating ====================
-- transaction_open?()
   -> 0.0000s
-- index_exists?(:packages_packages, [:id, :created_at], {:name=>"index_packages_packages_on_id_and_created_at", :algorithm=>:concurrently})
   -> 0.0078s
-- add_index(:packages_packages, [:id, :created_at], {:name=>"index_packages_packages_on_id_and_created_at", :algorithm=>:concurrently})
   -> 0.0092s
== 20200901203055 AddIdCreatedAtIndexToPackages: migrated (0.0178s) ===========

== 20200901203055 AddIdCreatedAtIndexToPackages: reverting ====================
-- transaction_open?()
   -> 0.0000s
-- indexes(:packages_packages)
   -> 0.0170s
-- remove_index(:packages_packages, {:algorithm=>:concurrently, :name=>"index_packages_packages_on_id_and_created_at"})
   -> 0.0036s
== 20200901203055 AddIdCreatedAtIndexToPackages: reverted (0.0220s) ===========

New queries

Gitlab::UsageData.count(::Packages::Package.where(Gitlab::UsageData.last_28_days_time_period))

   (2.3ms)  SELECT MIN("packages_packages"."id") FROM "packages_packages" WHERE "packages_packages"."created_at" BETWEEN '2020-07-28 23:33:41.729627' AND '2020-08-25 23:33:41.729848'

 Result  (cost=0.53..0.54 rows=1 width=8) (actual time=28.297..28.297 rows=1 loops=1)
   Buffers: shared hit=2 read=1072
   I/O Timings: read=20.679
   InitPlan 1 (returns $0)
     ->  Limit  (cost=0.42..0.53 rows=1 width=8) (actual time=28.291..28.292 rows=1 loops=1)
           Buffers: shared hit=2 read=1072
           I/O Timings: read=20.679
           ->  Index Only Scan using idx_packages_packages_id_and_created_at on public.packages_packages  (cost=0.42..7829.48 rows=74464 width=8) (actual time=28.290..28.290 rows=1 loops=1)
                 Index Cond: ((packages_packages.id IS NOT NULL) AND (packages_packages.created_at >= '2020-07-28 23:33:41.729627+00'::timestamp with time zone) AND (packages_packages.created_at <= '2020-08-25 23:33:41.729848+00'::timestamp with time zone))
                 Heap Fetches: 1
                 Buffers: shared hit=2 read=1072
                 I/O Timings: read=20.679

-> Query processes too much data to return a relatively small number of rows. – Reduce data cardinality as early as possible during the execution, using one or several of the following techniques: new indexes, partitioning, query rewriting, denormalization. See the visualization of the plan to understand which plan nodes are the main bottlenecks.

-> Specialized index needed – The index(es) currently used does not serve quite well for the needs of this query (notice Rows Removed by Filter: ..., meaning that the index fetched many non-target rows). Consider adding more specialized index(es)

   (0.6ms)  SELECT MAX("packages_packages"."id") FROM "packages_packages" WHERE "packages_packages"."created_at" BETWEEN '2020-07-28 23:33:41.729627' AND '2020-08-25 23:33:41.729848'

 Result  (cost=0.53..0.54 rows=1 width=8) (actual time=0.338..0.338 rows=1 loops=1)
   Buffers: shared hit=2 read=5
   I/O Timings: read=0.254
   InitPlan 1 (returns $0)
     ->  Limit  (cost=0.42..0.53 rows=1 width=8) (actual time=0.329..0.329 rows=1 loops=1)
           Buffers: shared hit=2 read=5
           I/O Timings: read=0.254
           ->  Index Only Scan using idx_packages_packages_id_and_created_at on public.packages_packages  (cost=0.42..7829.48 rows=74464 width=8) (actual time=0.328..0.328 rows=1 loops=1)
                 Index Cond: ((packages_packages.id IS NOT NULL) AND (packages_packages.created_at >= '2020-07-28 23:33:41.729627+00'::timestamp with time zone) AND (packages_packages.created_at <= '2020-08-25 23:33:41.729848+00'::timestamp with time zone))
                 Heap Fetches: 0
                 Buffers: shared hit=2 read=5
                 I/O Timings: read=0.254

-> Looks good

   (0.6ms)  SELECT COUNT("packages_packages"."id") FROM "packages_packages" WHERE "packages_packages"."created_at" BETWEEN '2020-07-28 23:33:41.729627' AND '2020-08-25 23:33:41.729848' AND "packages_packages"."id" BETWEEN 35 AND 100034

 Aggregate  (cost=2261.88..2261.89 rows=1 width=8) (actual time=5.714..5.714 rows=1 loops=1)
   Buffers: shared hit=306
   ->  Index Only Scan using idx_packages_packages_id_and_created_at on public.packages_packages  (cost=0.42..2220.53 rows=16537 width=8) (actual time=5.709..5.709 rows=0 loops=1)
         Index Cond: ((packages_packages.id >= 35) AND (packages_packages.id <= 100034) AND (packages_packages.created_at >= '2020-07-28 23:33:41.729627+00'::timestamp with time zone) AND (packages_packages.created_at <= '2020-08-25 23:33:41.729848+00'::timestamp with time zone))
         Heap Fetches: 0
         Buffers: shared hit=306


-> LOOKS GOOD

Part of #205578 (closed)

Screenshots

Does this MR meet the acceptance criteria?

Conformity

Availability and Testing

Review and add/update tests for this feature/bug. Consider all test levels. See the Test Planning Process.
Tested in all supported browsers
Informed Infrastructure department of a default or new setting change, if applicable per definition of done

Security

If this MR contains changes to processing or storing of credentials or tokens, authorization and authentication methods and other items described in the security review guidelines:

Label as security and @ mention @gitlab-com/gl-security/appsec
The MR includes necessary changes to maintain consistency between UI, API, email, or other methods
Security reports checked/validated by a reviewer from the AppSec team

Edited Sep 01, 2020 by Giorgenes Gelatti

Adds rolling package count to usage data