Skip to content

Add index to ci_builds to optimize UsageData usage_activity_by_stage

  • usage_activity_by_stage.secure.user_container_scanning_jobs": -1,
  • "usage_activity_by_stage.secure.user_dast_jobs": -1,
  • "usage_activity_by_stage.secure.user_dependency_scanning_jobs": -1,
  • "usage_activity_by_stage.secure.user_license_management_jobs": -1,

All on the same table

This queries are timing out for all time period

Optimization approach

Out of two possible approaches listed at #230438 (comment 408879966) This MR implements second one, which adds new dedicated index, as it requires less effort to implement and should sooner bring noticeable result.

Proposed dedicated index

CREATE INDEX index_secure_ci_builds_on_user_id_created_at_parser_features_alt ON public.ci_builds USING btree (user_id, name, type) WHERE (((type)::text = Ci::Build::text) AND ((name)::text = ANY (ARRAY[(container_scanning::character varying)::text, (dast::character varying)::text, (dependency_scanning::character varying)::text, (license_management::character varying)::text, (license_scanning::character varying)::text, (sast::character varying)::text, (coverage_fuzzing::character varying)::text, (secret_detection::character varying)::text])));

The query has been executed. Duration: 228.190 min (edited)

The major improvement over currently existing index that planner uses for time outing queries (index_secure_ci_builds_on_user_id_created_at_parser_features) is that new one consist all required data within itself, and that allows for Index Only scans that saves times required to fetch actual rows from tables. Queries along with execution plans and timings before and after proposed index was added on cold storage are supplied below

usage_activity_by_stage.secure.user_dependency_scanning_jobs improved from 268.894 ms to 10.614 ms

query

SELECT COUNT(DISTINCT ci_builds.user_id) FROM ci_builds WHERE ci_builds.type = Ci::Build AND ci_builds.name = dependency_scanning AND ci_builds.user_id BETWEEN 6916048 AND 6926048

plan: https://explain.depesz.com/s/dC3w

Time: 173.352 ms
  - planning: 1.697 ms
  - execution: 171.655 ms
    - I/O read: 166.211 ms
    - I/O write: 0.000 ms

Shared buffers:
  - hits: 10 (~80.00 KiB) from the buffer pool
  - reads: 75 (~600.00 KiB) from the OS file cache, including disk I/O
  - dirtied: 6 (~48.00 KiB)
  - writes: 0

with new index

plan: https://explain.depesz.com/s/Oz7K

Time: 3.151 ms
  - planning: 3.001 ms
  - execution: 0.150 ms
    - I/O read: 0.000 ms
    - I/O write: 0.000 ms

Shared buffers:
  - hits: 4 (~32.00 KiB) from the buffer pool
  - reads: 0 from the OS file cache, including disk I/O
  - dirtied: 0
  - writes: 0

plan: https://explain.depesz.com/s/JHYZ

Time: 10.614 ms
  - planning: 2.572 ms
  - execution: 8.042 ms
    - I/O read: 6.820 ms
    - I/O write: 0.000 ms

Shared buffers:
  - hits: 135 (~1.10 MiB) from the buffer pool
  - reads: 2 (~16.00 KiB) from the OS file cache, including disk I/O
  - dirtied: 1 (~8.00 KiB)
  - writes: 0

with reduced new index

plan: https://explain.depesz.com/s/H5AP

Time: 10.431 ms
  - planning: 2.972 ms
  - execution: 7.459 ms
    - I/O read: 6.974 ms
    - I/O write: 0.000 ms
usage_activity_by_stage.secure.user_container_scanning_jobs improved from 8.128 s to 180.280 ms

query

SELECT COUNT(DISTINCT "ci_builds"."user_id") FROM "ci_builds" WHERE "ci_builds"."type" = 'Ci::Build' AND "ci_builds"."name" = 'container_scanning' AND "ci_builds"."user_id" BETWEEN  3916048 AND 3926048

plan: https://explain.depesz.com/s/ylDT

Time: 8.128 s
  - planning: 1.579 ms
  - execution: 8.127 s
    - I/O read: 8.021 s
    - I/O write: 0.000 ms

Shared buffers:
  - hits: 66 (~528.00 KiB) from the buffer pool
  - reads: 11900 (~93.00 MiB) from the OS file cache, including disk I/O
  - dirtied: 33 (~264.00 KiB)
  - writes: 0

with new index

plan: https://explain.depesz.com/s/EPtY

Time: 121.320 ms
  - planning: 2.204 ms
  - execution: 119.116 ms
    - I/O read: 106.413 ms
    - I/O write: 0.000 ms

Shared buffers:
  - hits: 633 (~4.90 MiB) from the buffer pool
  - reads: 74 (~592.00 KiB) from the OS file cache, including disk I/O
  - dirtied: 0
  - writes: 0

note

Worth to notice that for different batch, execution time was very different

SELECT COUNT(DISTINCT "ci_builds"."user_id") FROM "ci_builds" WHERE "ci_builds"."type" = 'Ci::Build' AND "ci_builds"."name" = 'container_scanning' AND "ci_builds"."user_id" BETWEEN  6916048 AND 6926048

plan: https://explain.depesz.com/s/6Wsy

Time: 1.976 ms
  - planning: 1.701 ms
  - execution: 0.275 ms
    - I/O read: 0.000 ms
    - I/O write: 0.000 ms

Shared buffers:
  - hits: 43 (~344.00 KiB) from the buffer pool
  - reads: 0 from the OS file cache, including disk I/O
  - dirtied: 0
  - writes: 0

with new index

plan: https://explain.depesz.com/s/9Lq4

Time: 3.330 ms
  - planning: 2.105 ms
  - execution: 1.225 ms
    - I/O read: 0.278 ms
    - I/O write: 0.000 ms

Shared buffers:
  - hits: 22 (~176.00 KiB) from the buffer pool
  - reads: 3 (~24.00 KiB) from the OS file cache, including disk I/O
  - dirtied: 0
  - writes: 0

uncached batch

plan: https://explain.depesz.com/s/KvOO

Time: 19.338 ms
  - planning: 2.425 ms
  - execution: 16.913 ms
    - I/O read: 15.438 ms
    - I/O write: 0.000 ms

Shared buffers:
  - hits: 82 (~656.00 KiB) from the buffer pool
  - reads: 11 (~88.00 KiB) from the OS file cache, including disk I/O
  - dirtied: 1 (~8.00 KiB)
  - writes: 0

with new reduced index

plan: https://explain.depesz.com/s/dTHN

Time: 180.280 ms
  - planning: 2.290 ms
  - execution: 177.990 ms
    - I/O read: 163.395 ms
    - I/O write: 0.000 ms

Shared buffers:
  - hits: 625 (~4.90 MiB) from the buffer pool
  - reads: 60 (~480.00 KiB) from the OS file cache, including disk I/O
  - dirtied: 3 (~24.00 KiB)
  - writes: 0
usage_activity_by_stage.secure.user_dast_jobs improved from 23.503 ms to 19.867 ms

query

SELECT COUNT(DISTINCT "ci_builds"."user_id") FROM "ci_builds" WHERE "ci_builds"."type" = 'Ci::Build' AND "ci_builds"."name" = 'dast' AND "ci_builds"."user_id" BETWEEN 3916048 AND 3926048

plan: https://explain.depesz.com/s/tEI9

Time: 23.503 ms
  - planning: 1.751 ms
  - execution: 21.752 ms
    - I/O read: 0.000 ms
    - I/O write: 0.000 ms

Shared buffers:
  - hits: 11946 (~93.30 MiB) from the buffer pool
  - reads: 0 from the OS file cache, including disk I/O
  - dirtied: 0
  - writes: 0

with new index

plan: https://explain.depesz.com/s/FtlE

Time: 19.655 ms
  - planning: 2.938 ms
  - execution: 16.717 ms
    - I/O read: 0.000 ms
    - I/O write: 0.000 ms

Shared buffers:
  - hits: 1818 (~14.20 MiB) from the buffer pool
  - reads: 0 from the OS file cache, including disk I/O
  - dirtied: 0
  - writes: 0

plan: https://explain.depesz.com/s/ZjJj

Time: 19.867 ms
  - planning: 2.940 ms
  - execution: 16.927 ms
    - I/O read: 15.960 ms
    - I/O write: 0.000 ms

Shared buffers:
  - hits: 36 (~288.00 KiB) from the buffer pool
  - reads: 11 (~88.00 KiB) from the OS file cache, including disk I/O
  - dirtied: 0
  - writes: 0

with new reduced index

In this case new reduced index seems to decrease performance but not by far

plan: https://explain.depesz.com/s/G5Tx

Time: 64.286 ms
  - planning: 2.837 ms
  - execution: 61.449 ms
    - I/O read: 45.582 ms
    - I/O write: 0.000 ms

Shared buffers:
  - hits: 1791 (~14.00 MiB) from the buffer pool
  - reads: 11 (~88.00 KiB) from the OS file cache, including disk I/O
  - dirtied: 6 (~48.00 KiB)
  - writes: 0
usage_activity_by_stage.secure.user_license_management_jobs improved from 30.629 s to 3.901 ms

query

SELECT COUNT(DISTINCT "ci_builds"."user_id") FROM "ci_builds" WHERE "ci_builds"."type" = 'Ci::Build' AND "ci_builds"."name" = 'license_management' AND "ci_builds"."user_id" BETWEEN 3916048 AND 3926048

plan: https://explain.depesz.com/s/6NQr

Time: 30.629 s
  - planning: 2.072 ms
  - execution: 30.627 s
    - I/O read: 30.447 s
    - I/O write: 0.000 ms

Shared buffers:
  - hits: 66 (~528.00 KiB) from the buffer pool
  - reads: 11900 (~93.00 MiB) from the OS file cache, including disk I/O
  - dirtied: 33 (~264.00 KiB)
  - writes: 0

with new index

plan: https://explain.depesz.com/s/Bcwd

Time: 5.537 ms
  - planning: 2.938 ms
  - execution: 2.599 ms
    - I/O read: 0.000 ms
    - I/O write: 0.000 ms

Shared buffers:
  - hits: 268 (~2.10 MiB) from the buffer pool
  - reads: 0 from the OS file cache, including disk I/O
  - dirtied: 0
  - writes: 0

plan: https://explain.depesz.com/s/nbJI

Time: 2.560 ms
  - planning: 2.432 ms
  - execution: 0.128 ms
    - I/O read: 0.000 ms
    - I/O write: 0.000 ms

Shared buffers:
  - hits: 4 (~32.00 KiB) from the buffer pool
  - reads: 0 from the OS file cache, including disk I/O
  - dirtied: 0
  - writes: 0

with new reduced index

plan: https://explain.depesz.com/s/sKvE

Time: 3.901 ms
  - planning: 2.296 ms
  - execution: 1.605 ms
    - I/O read: 0.000 ms
    - I/O write: 0.000 ms

Shared buffers:
  - hits: 249 (~1.90 MiB) from the buffer pool
  - reads: 0 from the OS file cache, including disk I/O
  - dirtied: 0
  - writes: 0

Migration

== 20200908064229 AddPartialIndexToCiBuilsTableOnUserIdNameType: migrating ====
-- transaction_open?()
   -> 0.0000s
-- index_exists?(:ci_builds, [:user_id, :name, :type], {:where=>"(((type)::text = 'Ci::Build'::text) AND ((name)::text = ANY (ARRAY[('container_scanning'::character varying)::text, ('dast'::character varying)::text, ('dependency_scanning'::character varying)::text, ('license_management'::character varying)::text, ('license_scanning'::character varying)::text, ('sast'::character varying)::text, ('coverage_fuzzing'::character varying)::text, ('secret_detection'::character varying)::text])))", :name=>"index_partial_ci_builds_on_user_id_name_type_parser_features", :algorithm=>:concurrently})
   -> 0.0085s
-- add_index(:ci_builds, [:user_id, :name, :type], {:where=>"(((type)::text = 'Ci::Build'::text) AND ((name)::text = ANY (ARRAY[('container_scanning'::character varying)::text, ('dast'::character varying)::text, ('dependency_scanning'::character varying)::text, ('license_management'::character varying)::text, ('license_scanning'::character varying)::text, ('sast'::character varying)::text, ('coverage_fuzzing'::character varying)::text, ('secret_detection'::character varying)::text])))", :name=>"index_partial_ci_builds_on_user_id_name_type_parser_features", :algorithm=>:concurrently})
   -> 0.0050s
== 20200908064229 AddPartialIndexToCiBuilsTableOnUserIdNameType: migrated (0.0138s)

== 20200908064229 AddPartialIndexToCiBuilsTableOnUserIdNameType: reverting ====
-- transaction_open?()
   -> 0.0000s
-- indexes(:ci_builds)
   -> 0.0100s
-- remove_index(:ci_builds, {:algorithm=>:concurrently, :name=>"index_partial_ci_builds_on_user_id_name_type_parser_features"})
   -> 0.0020s
== 20200908064229 AddPartialIndexToCiBuilsTableOnUserIdNameType: reverted (0.0124s)

Follow up

#246763 (closed)

After this MR gets deployed we may want to monitor index_secure_ci_builds_on_user_id_created_at_parser_features usage with this query Based on ddb1e2f9 and 2ff5f6f8 commits it's main purpose was support UsageData performance, and this new one, may takes it job. If that would be the case, we should remove index_secure_ci_builds_on_user_id_created_at_parser_features

Reports #230438 (closed)

Edited by Mikołaj Wawrzyniak

Merge request reports