Skip to content

Report all unique users for Secure scanners

Sam Kerr requested to merge update-secure-smau-metric into master

What does this MR do?

This MR introduces a new metric which intended to be used for Secure SMAU. Specifically, it is intended to report the number of unique users who used one or more of the Secure scanners.

Examples:

  • 1 user - the user uses SAST & DAST, then report 1
  • 2 users - 1 uses SAST & DAST, 1 uses no scanners, then report 1.
  • 10 users - 5 use SAST-only, 5 use DAST-only, report 10.
  • 10 users - 5 use SAST & DAST, 2 use SAST-only, 3 use DAST-only, then report 10.

Issue #219304 (closed)


Data below as requested in the handbook

All time periods

Time to create the index

The query has been executed. Duration: 178 minutes according to slack

Queries

All time periods
[6] pry(main)> Gitlab::UsageData.distinct_count(::Ci::Build.where(name: SECURE_PRODUCT_TYPES.keys).where({}), :user_id)
   (4.6ms)  SELECT MIN("ci_builds"."user_id") FROM "ci_builds" WHERE "ci_builds"."type" = $1 AND "ci_builds"."name" IN ($2, $3, $4, $5, $6, $7, $8)  [["type", "Ci::Build"], ["name", "container_scanning"], ["name", "dast"], ["name", "dependency_scanning"], ["name", "license_management"], ["name", "license_scanning"], ["name", "sast"], ["name", "secret_detection"]]
   (0.9ms)  SELECT MAX("ci_builds"."user_id") FROM "ci_builds" WHERE "ci_builds"."type" = $1 AND "ci_builds"."name" IN ($2, $3, $4, $5, $6, $7, $8)  [["type", "Ci::Build"], ["name", "container_scanning"], ["name", "dast"], ["name", "dependency_scanning"], ["name", "license_management"], ["name", "license_scanning"], ["name", "sast"], ["name", "secret_detection"]]
   (0.7ms)  SELECT COUNT(DISTINCT "ci_builds"."user_id") FROM "ci_builds" WHERE "ci_builds"."type" = $1 AND "ci_builds"."name" IN ($2, $3, $4, $5, $6, $7, $8) AND "ci_builds"."user_id" BETWEEN $9 AND $10  [["type", "Ci::Build"], ["name", "container_scanning"], ["name", "dast"], ["name", "dependency_scanning"], ["name", "license_management"], ["name", "license_scanning"], ["name", "sast"], ["name", "secret_detection"], ["user_id", 0], ["user_id", 9999]]
=> 0

Execution plan and no index performance

Min - https://explain.depesz.com/s/bZv
 Result  (cost=90.16..90.17 rows=1 width=4) (actual time=20648.647..20648.647 rows=1 loops=1)
   Buffers: shared hit=384 read=10625 dirtied=2
   I/O Timings: read=20494.232
   InitPlan 1 (returns $0)
     ->  Limit  (cost=0.57..90.16 rows=1 width=4) (actual time=20648.639..20648.640 rows=1 loops=1)
           Buffers: shared hit=384 read=10625 dirtied=2
           I/O Timings: read=20494.232
           ->  Index Scan using index_ci_builds_on_user_id_and_created_at_and_type_eq_ci_build on public.ci_builds  (cost=0.57..418044426.54 rows=4666704 width=4) (actual time=20648.637..20648.637 rows=1 loops=1)
                 Index Cond: (ci_builds.user_id IS NOT NULL)
                 Filter: ((ci_builds.name)::text = ANY ('{container_scanning,dast,dependency_scanning,license_management,license_scanning,sast,secret_detection}'::text[]))
                 Rows Removed by Filter: 12555
                 Buffers: shared hit=384 read=10625 dirtied=2
                 I/O Timings: read=20494.232
Time: 20.653 s
  - planning: 3.868 ms
  - execution: 20.649 s
    - I/O read: 20.494 s
    - I/O write: 0.000 ms

Shared buffers:
  - hits: 384 (~3.00 MiB) from the buffer pool
  - reads: 10625 (~83.00 MiB) from the OS file cache, including disk I/O
  - dirtied: 2 (~16.00 KiB)
  - writes: 0
Max - https://explain.depesz.com/s/4cE3
Result  (cost=90.16..90.17 rows=1 width=4) (actual time=582.146..582.147 rows=1 loops=1)
   Buffers: shared hit=82 read=457 dirtied=79
   I/O Timings: read=573.275
   InitPlan 1 (returns $0)
     ->  Limit  (cost=0.57..90.16 rows=1 width=4) (actual time=582.140..582.140 rows=1 loops=1)
           Buffers: shared hit=82 read=457 dirtied=79
           I/O Timings: read=573.275
           ->  Index Scan using index_ci_builds_on_user_id_and_created_at_and_type_eq_ci_build on public.ci_builds  (cost=0.57..418044426.54 rows=4666704 width=4) (actual time=582.137..582.137 rows=1 loops=1)
                 Index Cond: (ci_builds.user_id IS NOT NULL)
                 Filter: ((ci_builds.name)::text = ANY ('{container_scanning,dast,dependency_scanning,license_management,license_scanning,sast,secret_detection}'::text[]))
                 Rows Removed by Filter: 227
                 Buffers: shared hit=82 read=457 dirtied=79
                 I/O Timings: read=573.275
Time: 585.962 ms
  - planning: 3.770 ms
  - execution: 582.192 ms
    - I/O read: 573.275 ms
    - I/O write: 0.000 ms

Shared buffers:
  - hits: 82 (~656.00 KiB) from the buffer pool
  - reads: 457 (~3.60 MiB) from the OS file cache, including disk I/O
  - dirtied: 79 (~632.00 KiB)
  - writes: 0
Count - https://explain.depesz.com/s/keSP
 Aggregate  (cost=196936.35..196936.36 rows=1 width=8) (actual time=427679.207..427679.208 rows=1 loops=1)
   Buffers: shared hit=178 read=259215 dirtied=212
   I/O Timings: read=415664.327
   ->  Bitmap Heap Scan on public.ci_builds  (cost=141163.21..196848.20 rows=35259 width=4) (actual time=41635.425..427578.925 rows=28627 loops=1)
         Buffers: shared hit=175 read=259215 dirtied=212
         I/O Timings: read=415664.327
         ->  BitmapAnd  (cost=141163.21..141163.21 rows=35259 width=0) (actual time=41510.288..41510.288 rows=0 loops=1)
               Buffers: shared hit=4 read=51948
               I/O Timings: read=35218.722
               ->  Bitmap Index Scan using index_security_ci_builds_on_name_and_id  (cost=0.00..64885.05 rows=4685823 width=0) (actual time=13244.816..13244.816 rows=4741054 loops=1)
                     Buffers: shared read=28471
                     I/O Timings: read=10598.129
               ->  Bitmap Index Scan using index_ci_builds_on_user_id  (cost=0.00..76260.28 rows=4340071 width=0) (actual time=27386.466..27386.466 rows=4322793 loops=1)
                     Index Cond: ((ci_builds.user_id >= 0) AND (ci_builds.user_id <= 9999))
                     Buffers: shared hit=4 read=23477
                     I/O Timings: read=24620.593
Time: 7.128 min
  - planning: 2.106 ms
  - execution: 7.128 min
    - I/O read: 6.928 min
    - I/O write: 0.000 ms

Shared buffers:
  - hits: 178 (~1.40 MiB) from the buffer pool
  - reads: 259215 (~2.00 GiB) from the OS file cache, including disk I/O
  - dirtied: 212 (~1.70 MiB)
  - writes: 0

Execution plan WITH index performance

Min - https://explain.depesz.com/s/fPXC
 Result  (cost=0.47..0.48 rows=1 width=4) (actual time=0.261..0.262 rows=1 loops=1)
   Buffers: shared hit=1 read=3
   I/O Timings: read=0.131
   InitPlan 1 (returns $0)
     ->  Limit  (cost=0.43..0.47 rows=1 width=4) (actual time=0.258..0.258 rows=1 loops=1)
           Buffers: shared hit=1 read=3
           I/O Timings: read=0.131
           ->  Index Only Scan using index_secure_ci_builds_on_user_id_created_at on public.ci_builds  (cost=0.43..186350.54 rows=4534188 width=4) (actual time=0.256..0.256 rows=1 loops=1)
                 Index Cond: (ci_builds.user_id IS NOT NULL)
                 Heap Fetches: 0
                 Buffers: shared hit=1 read=3
                 I/O Timings: read=0.131
Time: 5.850 ms
  - planning: 5.557 ms
  - execution: 0.293 ms
    - I/O read: 0.131 ms
    - I/O write: 0.000 ms

Shared buffers:
  - hits: 1 (~8.00 KiB) from the buffer pool
  - reads: 3 (~24.00 KiB) from the OS file cache, including disk I/O
  - dirtied: 0
  - writes: 0
Max - https://explain.depesz.com/s/xcxj
 Result  (cost=0.47..0.48 rows=1 width=4) (actual time=0.591..0.592 rows=1 loops=1)
   Buffers: shared hit=2 read=3
   I/O Timings: read=0.442
   InitPlan 1 (returns $0)
     ->  Limit  (cost=0.43..0.47 rows=1 width=4) (actual time=0.587..0.587 rows=1 loops=1)
           Buffers: shared hit=2 read=3
           I/O Timings: read=0.442
           ->  Index Only Scan using index_secure_ci_builds_on_user_id_created_at on public.ci_builds  (cost=0.43..186350.54 rows=4534188 width=4) (actual time=0.585..0.585 rows=1 loops=1)
                 Index Cond: (ci_builds.user_id IS NOT NULL)
                 Heap Fetches: 1
                 Buffers: shared hit=2 read=3
                 I/O Timings: read=0.442
Time: 7.175 ms
  - planning: 6.551 ms
  - execution: 0.624 ms
    - I/O read: 0.442 ms
    - I/O write: 0.000 ms

Shared buffers:
  - hits: 2 (~16.00 KiB) from the buffer pool
  - reads: 3 (~24.00 KiB) from the OS file cache, including disk I/O
  - dirtied: 0
  - writes: 0
Count - https://explain.depesz.com/s/BeeB
 Aggregate  (cost=1567.79..1567.80 rows=1 width=8) (actual time=58.963..58.964 rows=1 loops=1)
   Buffers: shared hit=12180 read=109
   I/O Timings: read=29.195
   ->  Index Only Scan using index_secure_ci_builds_on_user_id_created_at on public.ci_builds  (cost=0.43..1483.79 rows=33599 width=4) (actual time=0.060..52.404 rows=28676 loops=1)
         Index Cond: ((ci_builds.user_id >= 0) AND (ci_builds.user_id <= 9999))
         Heap Fetches: 119
         Buffers: shared hit=12177 read=109
         I/O Timings: read=29.195
Time: 62.243 ms
  - planning: 3.214 ms
  - execution: 59.029 ms
    - I/O read: 29.195 ms
    - I/O write: 0.000 ms

Shared buffers:
  - hits: 12180 (~95.20 MiB) from the buffer pool
  - reads: 109 (~872.00 KiB) from the OS file cache, including disk I/O
  - dirtied: 0
  - writes: 0

28-day period

Queries

28 days period
[7] pry(main)> Gitlab::UsageData.distinct_count(::Ci::Build.where(name: SECURE_PRODUCT_TYPES.keys).where(created_at: 28.days.ago..Time.current), :user_id)
   (1.4ms)  SELECT MIN("ci_builds"."user_id") FROM "ci_builds" WHERE "ci_builds"."type" = $1 AND "ci_builds"."name" IN ($2, $3, $4, $5, $6, $7, $8) AND "ci_builds"."created_at" BETWEEN $9 AND $10  [["type", "Ci::Build"], ["name", "container_scanning"], ["name", "dast"], ["name", "dependency_scanning"], ["name", "license_management"], ["name", "license_scanning"], ["name", "sast"], ["name", "secret_detection"], ["created_at", "2020-05-27 17:51:53.319709"], ["created_at", "2020-06-24 17:51:53.319881"]]
   (1.1ms)  SELECT MAX("ci_builds"."user_id") FROM "ci_builds" WHERE "ci_builds"."type" = $1 AND "ci_builds"."name" IN ($2, $3, $4, $5, $6, $7, $8) AND "ci_builds"."created_at" BETWEEN $9 AND $10  [["type", "Ci::Build"], ["name", "container_scanning"], ["name", "dast"], ["name", "dependency_scanning"], ["name", "license_management"], ["name", "license_scanning"], ["name", "sast"], ["name", "secret_detection"], ["created_at", "2020-05-27 17:51:53.319709"], ["created_at", "2020-06-24 17:51:53.319881"]]
   (0.7ms)  SELECT COUNT(DISTINCT "ci_builds"."user_id") FROM "ci_builds" WHERE "ci_builds"."type" = $1 AND "ci_builds"."name" IN ($2, $3, $4, $5, $6, $7, $8) AND "ci_builds"."created_at" BETWEEN $9 AND $10 AND "ci_builds"."user_id" BETWEEN $11 AND $12  [["type", "Ci::Build"], ["name", "container_scanning"], ["name", "dast"], ["name", "dependency_scanning"], ["name", "license_management"], ["name", "license_scanning"], ["name", "sast"], ["name", "secret_detection"], ["created_at", "2020-05-27 17:51:53.319709"], ["created_at", "2020-06-24 17:51:53.319881"], ["user_id", 0], ["user_id", 9999]]
=> 0

Execution plan and no index performance

Min - https://explain.depesz.com/s/iXaj
Result  (cost=181.73..181.74 rows=1 width=4) (actual time=1243.508..1243.509 rows=1 loops=1)
   Buffers: shared hit=2 read=745
   I/O Timings: read=1232.149
   InitPlan 1 (returns $0)
     ->  Limit  (cost=0.57..181.73 rows=1 width=4) (actual time=1243.501..1243.502 rows=1 loops=1)
           Buffers: shared hit=2 read=745
           I/O Timings: read=1232.149
           ->  Index Scan using index_ci_builds_on_user_id_and_created_at_and_type_eq_ci_build on public.ci_builds  (cost=0.57..54094833.42 rows=298613 width=4) (actual time=1243.499..1243.499 rows=1 loops=1)
                 Index Cond: ((ci_builds.user_id IS NOT NULL) AND (ci_builds.created_at >= '2020-05-27 17:51:53.319709'::timestamp without time zone) AND (ci_builds.created_at <= '2020-06-24 17:51:53.319881'::timestamp without time zone))
                 Filter: ((ci_builds.name)::text = ANY ('{container_scanning,dast,dependency_scanning,license_management,license_scanning,sast,secret_detection}'::text[]))
                 Rows Removed by Filter: 19
                 Buffers: shared hit=2 read=745
                 I/O Timings: read=1232.149
Time: 1.248 s
  - planning: 4.910 ms
  - execution: 1.244 s
    - I/O read: 1.232 s
    - I/O write: 0.000 ms

Shared buffers:
  - hits: 2 (~16.00 KiB) from the buffer pool
  - reads: 745 (~5.80 MiB) from the OS file cache, including disk I/O
  - dirtied: 0
  - writes: 0
Max - https://explain.depesz.com/s/JZKv
Result  (cost=181.73..181.74 rows=1 width=4) (actual time=474.878..474.879 rows=1 loops=1)
   Buffers: shared hit=29 read=197
   I/O Timings: read=470.326
   InitPlan 1 (returns $0)
     ->  Limit  (cost=0.57..181.73 rows=1 width=4) (actual time=474.872..474.872 rows=1 loops=1)
           Buffers: shared hit=29 read=197
           I/O Timings: read=470.326
           ->  Index Scan using index_ci_builds_on_user_id_and_created_at_and_type_eq_ci_build on public.ci_builds  (cost=0.57..54094833.42 rows=298613 width=4) (actual time=474.869..474.869 rows=1 loops=1)
                 Index Cond: ((ci_builds.user_id IS NOT NULL) AND (ci_builds.created_at >= '2020-05-27 17:51:53.319709'::timestamp without time zone) AND (ci_builds.created_at <= '2020-06-24 17:51:53.319881'::timestamp without time zone))
                 Filter: ((ci_builds.name)::text = ANY ('{container_scanning,dast,dependency_scanning,license_management,license_scanning,sast,secret_detection}'::text[]))
                 Rows Removed by Filter: 227
                 Buffers: shared hit=29 read=197
                 I/O Timings: read=470.326
Time: 479.572 ms
  - planning: 4.629 ms
  - execution: 474.943 ms
    - I/O read: 470.326 ms
    - I/O write: 0.000 ms

Shared buffers:
  - hits: 29 (~232.00 KiB) from the buffer pool
  - reads: 197 (~1.50 MiB) from the OS file cache, including disk I/O
  - dirtied: 0
  - writes: 0
Count - https://explain.depesz.com/s/u6q6
 Aggregate  (cost=197013.64..197013.65 rows=1 width=8) (actual time=190511.261..190511.262 rows=1 loops=1)
   Buffers: shared hit=146663 read=112730
   I/O Timings: read=178282.086
   ->  Bitmap Heap Scan on public.ci_builds  (cost=141146.71..197008.00 rows=2256 width=4) (actual time=47392.114..190510.779 rows=480 loops=1)
         Filter: ((ci_builds.created_at >= '2020-05-27 17:51:53.319709'::timestamp without time zone) AND (ci_builds.created_at <= '2020-06-24 17:51:53.319881'::timestamp without time zone))
         Rows Removed by Filter: 28147
         Buffers: shared hit=146660 read=112730
         I/O Timings: read=178282.086
         ->  BitmapAnd  (cost=141146.71..141146.71 rows=35259 width=0) (actual time=47276.276..47276.276 rows=0 loops=1)
               Buffers: shared hit=4 read=51948
               I/O Timings: read=39633.362
               ->  Bitmap Index Scan using index_security_ci_builds_on_name_and_id  (cost=0.00..64885.05 rows=4685823 width=0) (actual time=14889.519..14889.519 rows=4741054 loops=1)
                     Buffers: shared read=28471
                     I/O Timings: read=11665.481
               ->  Bitmap Index Scan using index_ci_builds_on_user_id  (cost=0.00..76260.28 rows=4340071 width=0) (actual time=31212.739..31212.740 rows=4322793 loops=1)
                     Index Cond: ((ci_builds.user_id >= 0) AND (ci_builds.user_id <= 9999))
                     Buffers: shared hit=4 read=23477
                     I/O Timings: read=27967.881
Time: 3.176 min
  - planning: 2.838 ms
  - execution: 3.176 min
    - I/O read: 2.971 min
    - I/O write: 0.000 ms

Shared buffers:
  - hits: 146663 (~1.10 GiB) from the buffer pool
  - reads: 112730 (~880.70 MiB) from the OS file cache, including disk I/O
  - dirtied: 0
  - writes: 0

Execution plan WITH index performance

Min - https://explain.depesz.com/s/2Obb
 Result  (cost=0.74..0.75 rows=1 width=4) (actual time=0.125..0.125 rows=1 loops=1)
   Buffers: shared hit=16
   InitPlan 1 (returns $0)
     ->  Limit  (cost=0.43..0.74 rows=1 width=4) (actual time=0.122..0.122 rows=1 loops=1)
           Buffers: shared hit=16
           ->  Index Only Scan using index_secure_ci_builds_on_user_id_created_at on public.ci_builds  (cost=0.43..91233.22 rows=295567 width=4) (actual time=0.121..0.121 rows=1 loops=1)
                 Index Cond: ((ci_builds.user_id IS NOT NULL) AND (ci_builds.created_at >= '2020-05-27 17:51:53.319709'::timestamp without time zone) AND (ci_builds.created_at <= '2020-06-24 17:51:53.319881'::timestamp without time zone))
                 Heap Fetches: 0
                 Buffers: shared hit=16
Time: 5.682 ms
  - planning: 5.522 ms
  - execution: 0.160 ms
    - I/O read: 0.000 ms
    - I/O write: 0.000 ms

Shared buffers:
  - hits: 16 (~128.00 KiB) from the buffer pool
  - reads: 0 from the OS file cache, including disk I/O
  - dirtied: 0
  - writes: 0
Max - https://explain.depesz.com/s/STyJ
 Result  (cost=0.74..0.75 rows=1 width=4) (actual time=0.039..0.039 rows=1 loops=1)
   Buffers: shared hit=4
   InitPlan 1 (returns $0)
     ->  Limit  (cost=0.43..0.74 rows=1 width=4) (actual time=0.036..0.036 rows=1 loops=1)
           Buffers: shared hit=4
           ->  Index Only Scan using index_secure_ci_builds_on_user_id_created_at on public.ci_builds  (cost=0.43..91233.22 rows=295567 width=4) (actual time=0.034..0.034 rows=1 loops=1)
                 Index Cond: ((ci_builds.user_id IS NOT NULL) AND (ci_builds.created_at >= '2020-05-27 17:51:53.319709'::timestamp without time zone) AND (ci_builds.created_at <= '2020-06-24 17:51:53.319881'::timestamp without time zone))
                 Heap Fetches: 0
                 Buffers: shared hit=4
Time: 7.932 ms
  - planning: 7.860 ms
  - execution: 0.072 ms
    - I/O read: 0.000 ms
    - I/O write: 0.000 ms

Shared buffers:
  - hits: 4 (~32.00 KiB) from the buffer pool
  - reads: 0 from the OS file cache, including disk I/O
  - dirtied: 0
  - writes: 0
Count - https://explain.depesz.com/s/qckK
 Aggregate  (cost=767.22..767.23 rows=1 width=8) (actual time=2.739..2.739 rows=1 loops=1)
   Buffers: shared hit=553
   ->  Index Only Scan using index_secure_ci_builds_on_user_id_created_at on public.ci_builds  (cost=0.43..761.75 rows=2190 width=4) (actual time=0.299..2.641 rows=496 loops=1)
         Index Cond: ((ci_builds.user_id >= 0) AND (ci_builds.user_id <= 9999) AND (ci_builds.created_at >= '2020-05-27 17:51:53.319709'::timestamp without time zone) AND (ci_builds.created_at <= '2020-06-24 17:51:53.319881'::timestamp without time zone))
         Heap Fetches: 63
         Buffers: shared hit=553
Time: 5.899 ms
  - planning: 3.110 ms
  - execution: 2.789 ms
    - I/O read: 0.000 ms
    - I/O write: 0.000 ms

Shared buffers:
  - hits: 553 (~4.30 MiB) from the buffer pool
  - reads: 0 from the OS file cache, including disk I/O
  - dirtied: 0
  - writes: 0

Query for index

CREATE INDEX index_secure_ci_builds_on_user_id_created_at ON public.ci_builds USING btree (user_id, created_at) WHERE (((type)::text = 'Ci::Build'::text) AND ((name)::text = ANY ((ARRAY['container_scanning'::character varying, 'dast'::character varying, 'dependency_scanning'::character varying, 'license_management'::character varying, 'license_scanning'::character varying, 'sast'::character varying, 'secret_detection'::character varying])::text[])));

Index Query details

Migration Outputs

Up execution
$ VERBOSE=true bundle exec rake db:migrate:up VERSION=20200618152212
== 20200618152212 UpdateSecureSmauIndex: migrating ============================
-- transaction_open?()
   -> 0.0000s
-- index_exists?(:ci_builds, [:user_id, :created_at], {:where=>"type = 'Ci::Build' AND name IN ('container_scanning', 'dast', 'dependency_scanning', 'license_management', 'license_scanning', 'sast', 'secret_detection')", :name=>"index_secure_ci_builds_on_user_id_created_at", :algorithm=>:concurrently})
   -> 0.0106s
== 20200618152212 UpdateSecureSmauIndex: migrated (0.0107s) ===================
Down execution
$ VERBOSE=true bundle exec rake db:migrate:down VERSION=20200618152212
== 20200618152212 UpdateSecureSmauIndex: reverting ============================
-- transaction_open?()
   -> 0.0000s
-- indexes(:ci_builds)
   -> 0.0113s
== 20200618152212 UpdateSecureSmauIndex: reverted (0.0121s) ===================


Old from before 2020-06-24
#database-lab results before index
explain SELECT COUNT(DISTINCT "ci_builds"."user_id") FROM "ci_builds" WHERE "ci_builds"."type" = 'Ci::Build' AND "ci_builds"."name" IN ('container_scanning', 'dast', 'dependency_scanning', 'license_management', 'license_scanning', 'sast', 'secret_detection') AND "ci_builds"."user_id" BETWEEN 1 AND 10000
Session: joe-brgj47g350j65v2mjf70
Plan with execution:
 Aggregate  (cost=185201.56..185201.57 rows=1 width=8) (actual time=412851.918..412851.918 rows=1 loops=1)
   Buffers: shared hit=12 read=253187 dirtied=322
   I/O Timings: read=400290.912
   ->  Bitmap Heap Scan on public.ci_builds  (cost=131413.65..185116.55 rows=34006 width=4) (actual time=41231.500..412770.579 rows=28434 loops=1)
         Buffers: shared hit=4 read=253187 dirtied
[...SKIP...]
Full execution plan (The text in the preview above has been cut)
Other artifacts are provided in the thread
Recommendations:
:exclamation: Query processes too much data to return a relatively small number of rows. – Reduce data cardinality as early as possible during the execution, using one or several of the following techniques: new indexes, partitioning, query rewriting, denormalization. See the visualization of the plan to understand which plan nodes are the main bottlenecks. Show details
Summary:
Time: 6.881 min
  - planning: 2.178 ms
  - execution: 6.881 min
    - I/O read: 6.672 min
    - I/O write: 0.000 ms

Shared buffers:
  - hits: 12 (~96.00 KiB) from the buffer pool
  - reads: 253187 (~1.90 GiB) from the OS file cache, including disk I/O
  - dirtied: 322 (~2.50 MiB)
  - writes: 0
#database-lab results after index
explain SELECT COUNT(DISTINCT "ci_builds"."user_id") FROM "ci_builds" WHERE "ci_builds"."type" = 'Ci::Build' AND "ci_builds"."name" IN ('container_scanning', 'dast', 'dependency_scanning', 'license_management', 'license_scanning', 'sast', 'secret_detection') AND "ci_builds"."user_id" BETWEEN 1 AND 10000
Session: joe-brkd24o350j25dhvg1p0
Plan with execution:
 Aggregate  (cost=50397.30..50397.31 rows=1 width=8) (actual time=64920.032..64920.033 rows=1 loops=1)
   Buffers: shared hit=208 read=27920 dirtied=22
   I/O Timings: read=64382.152
   ->  Index Scan using ci_builds_user_id_created_at_idx on public.ci_builds  (cost=0.43..50312.34 rows=33981 width=4) (actual time=4.933..64854.250 rows=28483 loops=1)
         Index Cond: ((ci_builds.u
[...SKIP...]
Full execution plan (The text in the preview above has been cut)
Other artifacts are provided in the thread
Recommendations:
:exclamation: Query processes too much data to return a relatively small number of rows. – Reduce data cardinality as early as possible during the execution, using one or several of the following techniques: new indexes, partitioning, query rewriting, denormalization. See the visualization of the plan to understand which plan nodes are the main bottlenecks. Show details
Summary:
Time: 1.082 min
  - planning: 3.323 ms
  - execution: 1.082 min
    - I/O read: 1.073 min
    - I/O write: 0.000 ms

Shared buffers:
  - hits: 208 (~1.60 MiB) from the buffer pool
  - reads: 27920 (~218.10 MiB) from the OS file cache, including disk I/O
  - dirtied: 22 (~176.00 KiB)
  - writes: 0

all time_period

[7] pry(main)> Gitlab::UsageData.distinct_count(::Ci::Build.where(name: SECURE_PRODUCT_TYPES.keys).where({}), :user_id)
   (1.7ms)  SELECT EXTRACT(EPOCH FROM (now() - pg_last_xact_replay_timestamp()))::float as lag
   (7.2ms)  SELECT MIN("ci_builds"."user_id") FROM "ci_builds" WHERE "ci_builds"."type" = 'Ci::Build' AND "ci_builds"."name" IN ('container_scanning', 'dast', 'dependency_scanning', 'license_management', 'license_scanning', 'sast', 'secret_detection')
   (2.5ms)  SELECT MAX("ci_builds"."user_id") FROM "ci_builds" WHERE "ci_builds"."type" = 'Ci::Build' AND "ci_builds"."name" IN ('container_scanning', 'dast', 'dependency_scanning', 'license_management', 'license_scanning', 'sast', 'secret_detection')
   (2.3ms)  SELECT COUNT(DISTINCT "ci_builds"."user_id") FROM "ci_builds" WHERE "ci_builds"."type" = 'Ci::Build' AND "ci_builds"."name" IN ('container_scanning', 'dast', 'dependency_scanning', 'license_management', 'license_scanning', 'sast', 'secret_detection') AND "ci_builds"."user_id" BETWEEN 1 AND 10000
=> 1

28 day time_period

[8] pry(main)> Gitlab::UsageData.distinct_count(::Ci::Build.where(name: SECURE_PRODUCT_TYPES.keys).where({ created_at: 28.days.ago..Time.current }), :user_id)
   (3.3ms)  SELECT MIN("ci_builds"."user_id") FROM "ci_builds" WHERE "ci_builds"."type" = 'Ci::Build' AND "ci_builds"."name" IN ('container_scanning', 'dast', 'dependency_scanning', 'license_management', 'license_scanning', 'sast', 'secret_detection') AND "ci_builds"."created_at" BETWEEN '2020-05-14 12:24:27.626104' AND '2020-06-11 12:24:27.627209'
   (1.1ms)  SELECT MAX("ci_builds"."user_id") FROM "ci_builds" WHERE "ci_builds"."type" = 'Ci::Build' AND "ci_builds"."name" IN ('container_scanning', 'dast', 'dependency_scanning', 'license_management', 'license_scanning', 'sast', 'secret_detection') AND "ci_builds"."created_at" BETWEEN '2020-05-14 12:24:27.626104' AND '2020-06-11 12:24:27.627209'
   (0.6ms)  SELECT COUNT(DISTINCT "ci_builds"."user_id") FROM "ci_builds" WHERE "ci_builds"."type" = 'Ci::Build' AND "ci_builds"."name" IN ('container_scanning', 'dast', 'dependency_scanning', 'license_management', 'license_scanning', 'sast', 'secret_detection') AND "ci_builds"."created_at" BETWEEN '2020-05-14 12:24:27.626104' AND '2020-06-11 12:24:27.627209' AND "ci_builds"."user_id" BETWEEN 1 AND 10000
=> 1

Does this MR meet the acceptance criteria?

Conformity

Availability and Testing

Security

If this MR contains changes to processing or storing of credentials or tokens, authorization and authentication methods and other items described in the security review guidelines:

  • Label as security and @ mention @gitlab-com/gl-security/appsec
  • The MR includes necessary changes to maintain consistency between UI, API, email, or other methods
  • Security reports checked/validated by a reviewer from the AppSec team
Edited by Sam Kerr

Merge request reports