Skip to content

Optimize deployment counters related to the ::Deployment

What does this MR do?

Optimize queries produced by

  1. Query no period per user
# 1 Locations https://gitlab.com/gitlab-org/gitlab/-/blob/master/ee/lib/ee/gitlab/usage_data.rb
time_period = { }

deployments: distinct_count(::Deployment.where(time_period), :user_id),
failed_deployments: distinct_count(::Deployment.failed.where(time_period), :user_id),
successful_deployments: distinct_count(::Deployment.success.where(time_period), :user_id)
Query with no time constraint and batching per user_id ```ruby Gitlab::UsageData.distinct_count(::Deployment.where(time_period), :user_id) ``` ```sql SELECT MIN("deployments"."user_id") FROM "deployments" SELECT MAX("deployments"."user_id") FROM "deployments" SELECT COUNT(DISTINCT "deployments"."user_id") FROM "deployments" WHERE "deployments"."user_id" BETWEEN $1 AND $2 [["user_id", 0], ["user_id", 9999]] ```
Gitlab::UsageData.distinct_count(::Deployment.failed.where(time_period), :user_id)
SELECT MIN("deployments"."user_id") FROM "deployments" WHERE "deployments"."status" = $1  [["status", 3]]
SELECT MAX("deployments"."user_id") FROM "deployments" WHERE "deployments"."status" = $1  [["status", 3]]
SELECT COUNT(DISTINCT "deployments"."user_id") FROM "deployments" WHERE "deployments"."status" = $1 AND "deployments"."user_id" BETWEEN $2 AND $3  [["status", 3], ["user_id", 0], ["user_id", 9999]]
Gitlab::UsageData.distinct_count(::Deployment.success.where(time_period), :user_id)
 SELECT MIN("deployments"."user_id") FROM "deployments" WHERE "deployments"."status" = $1  [["status", 2]]
 SELECT MAX("deployments"."user_id") FROM "deployments" WHERE "deployments"."status" = $1  [["status", 2]]
 SELECT COUNT(DISTINCT "deployments"."user_id") FROM "deployments" WHERE "deployments"."status" = $1 AND "deployments"."user_id" BETWEEN $2 AND $3  [["status", 2], ["user_id", 0], ["user_id", 9999]]
  1. Query with period filter per user
# 2 Locations https://gitlab.com/gitlab-org/gitlab/-/blob/master/ee/lib/ee/gitlab/usage_data.rb
time_period = { created_at: 28.days.ago..Time.current }

deployments: distinct_count(::Deployment.where(time_period), :user_id),
failed_deployments: distinct_count(::Deployment.failed.where(time_period), :user_id),
successful_deployments: distinct_count(::Deployment.success.where(time_period), :user_id)
query with time constraint and batching per user_id
 Gitlab::UsageData.distinct_count(::Deployment.where(time_period), :user_id)
SELECT MIN("deployments"."user_id") FROM "deployments" WHERE "deployments"."created_at" BETWEEN $1 AND $2  [["created_at", "2020-02-12 12:31:33.200695"], ["created_at", "2020-03-11 12:31:33.200777"]]
SELECT MAX("deployments"."user_id") FROM "deployments" WHERE "deployments"."created_at" BETWEEN $1 AND $2  [["created_at", "2020-02-12 12:31:33.200695"], ["created_at", "2020-03-11 12:31:33.200777"]]
SELECT COUNT(DISTINCT "deployments"."user_id") FROM "deployments" WHERE "deployments"."created_at" BETWEEN $1 AND $2 AND "deployments"."user_id" BETWEEN $3 AND $4  [["created_at", "2020-02-12 12:31:33.200695"], ["created_at", "2020-03-11 12:31:33.200777"], ["user_id", 0], ["user_id", 9999]]
Gitlab::UsageData.distinct_count(::Deployment.failed.where(time_period), :user_id)
SELECT MIN("deployments"."user_id") FROM "deployments" WHERE "deployments"."status" = $1 AND "deployments"."created_at" BETWEEN $2 AND $3  [["status", 3], ["created_at", "2020-02-12 12:31:33.200695"], ["created_at", "2020-03-11 12:31:33.200777"]]
MAX("deployments"."user_id") FROM "deployments" WHERE "deployments"."status" = $1 AND "deployments"."created_at" BETWEEN $2 AND $3  [["status", 3], ["created_at", "2020-02-12 12:31:33.200695"], ["created_at", "2020-03-11 12:31:33.200777"]]
SELECT COUNT(DISTINCT "deployments"."user_id") FROM "deployments" WHERE "deployments"."status" = $1 AND "deployments"."created_at" BETWEEN $2 AND $3 AND "deployments"."user_id" BETWEEN $4 AND $5  [["status", 3], ["created_at", "2020-02-12 12:31:33.200695"], ["created_at", "2020-03-11 12:31:33.200777"], ["user_id", 0], ["user_id", 9999]]
time_period = { created_at: 28.days.ago..Time.current }
Gitlab::UsageData.distinct_count(::Deployment.success.where(time_period), :user_id)
SELECT MIN("deployments"."user_id") FROM "deployments" WHERE "deployments"."status" = $1 AND "deployments"."created_at" BETWEEN $2 AND $3  [["status", 2], ["created_at", "2020-02-12 11:46:44.765586"], ["created_at", "2020-03-11 11:46:44.765767"]]
SELECT MAX("deployments"."user_id") FROM "deployments" WHERE "deployments"."status" = $1 AND "deployments"."created_at" BETWEEN $2 AND $3  [["status", 2], ["created_at", "2020-02-12 11:46:44.765586"], ["created_at", "2020-03-11 11:46:44.765767"]]
SELECT COUNT(DISTINCT "deployments"."user_id") FROM "deployments" WHERE "deployments"."status" = $1 AND "deployments"."created_at" BETWEEN $2 AND $3 AND "deployments"."user_id" BETWEEN $4 AND $5  [["status", 2], ["created_at", "2020-02-12 11:46:44.765586"], ["created_at", "2020-03-11 11:46:44.765767"], ["user_id", 0], ["user_id", 9999]]
  1. Query count deployments
# 3  Locations https://gitlab.com/gitlab-org/gitlab/-/blob/master/lib/gitlab/usage_data.rb

deployments: count(Deployment),
successful_deployments: count(Deployment.success),
failed_deployments: count(Deployment.failed),
query count of deployments
Gitlab::UsageData.count(::Deployment)
SELECT MIN("deployments"."id") FROM "deployments"
SELECT MAX("deployments"."id") FROM "deployments"
SELECT COUNT("deployments"."id") FROM "deployments" WHERE "deployments"."id" BETWEEN $1 AND $2  [["id", 1], ["id", 100000]]
query count of failed deployments
Gitlab::UsageData.count(::Deployment.failed)
SELECT MIN("deployments"."id") FROM "deployments" WHERE "deployments"."status" = $1  [["status", 3]]
SELECT MAX("deployments"."id") FROM "deployments" WHERE "deployments"."status" = $1  [["status", 3]]
SELECT COUNT("deployments"."id") FROM "deployments" WHERE "deployments"."status" = $1 AND "deployments"."id" BETWEEN $2 AND $3  [["status", 3], ["id", 0], ["id", 99999]]
query count of successful deployments
Gitlab::UsageData.count(::Deployment.success)
SELECT MIN("deployments"."id") FROM "deployments" WHERE "deployments"."status" = $1  [["status", 2]]
SELECT MAX("deployments"."id") FROM "deployments" WHERE "deployments"."status" = $1  [["status", 2]]
SELECT COUNT("deployments"."id") FROM "deployments" WHERE "deployments"."status" = $1 AND "deployments"."id" BETWEEN $2 AND $3  [["status", 2], ["id", 0], ["id", 99999]]

Notes: on optimization

  • Using database-lab
  • Add 2 indexes
exec CREATE INDEX CONCURRENTLY index_deployments_on_user_id_status_and_created_at ON deployments USING btree (user_id,status,created_at)

The query has been executed. Duration: 4.079 m

exec CREATE INDEX CONCURRENTLY index_deployments_on_id_and_status ON deployments USING btree (id,status)

The query has been executed. Duration: 2.118 min

1. MIN/MAX calculations per user_id

  • no period
explain ```sql explain SELECT MIN("deployments"."user_id") FROM "deployments" ```

Before: Time: 1.251 min https://explain.depesz.com/s/DN0V

After: Time: 3.471 ms https://explain.depesz.com/s/XrdC

  • period
explain ```sql explain SELECT MIN("deployments"."user_id") FROM "deployments" WHERE "deployments"."created_at" BETWEEN '2020-02-10 08:14:55.773067' AND '2020-03-09 08:14:55.773185' ```

Before: https://explain.depesz.com/s/lOoMK

After: Time: 10.867 ms https://explain.depesz.com/s/vWIh

2. COUNT DISTINCT per user_id

  • no period
explain ```sql explain SELECT COUNT(DISTINCT "deployments"."user_id") FROM "deployments" WHERE "deployments"."user_id" BETWEEN 0 AND 9999 ```

Before: Time: 9.184 min https://explain.depesz.com/s/5ikP

Filter: ((deployments.user_id >= 0) AND (deployments.user_id <= 9999))

After: Time: 224.553 ms https://explain.depesz.com/s/KqUM

  • period
explain ```sql explain SELECT COUNT(DISTINCT "deployments"."user_id") FROM "deployments" WHERE "deployments"."created_at" BETWEEN '2020-02-10 08:14:55.773067' AND '2020-03-09 08:14:55.773185' AND "deployments"."user_id" BETWEEN 0 AND 9999 ```

Before: Time: 9.138 s https://explain.depesz.com/s/DSrL

Filter: ((deployments.user_id >= 0) AND (deployments.user_id <= 9999))

After: Time: 10.867 ms https://explain.depesz.com/s/vWIh

3. MIN/MAX status failed per user_id

  • no period
explain ```sql explain SELECT MIN("deployments"."user_id") FROM "deployments" WHERE "deployments"."status" = 3 ```

Before: Time: 16.416 min https://explain.depesz.com/s/YHhP

After: Time: 0.530 ms https://explain.depesz.com/s/5y1g

  • period
explain ```sql explain SELECT MIN("deployments"."user_id") FROM "deployments" WHERE "deployments"."status" = 3 AND "deployments"."created_at" BETWEEN '2020-02-10 08:14:55.773067' AND '2020-03-09 08:14:55.773185' ```

Before: Time: 3.954 min https://explain.depesz.com/s/Tsc

After: Time: 0.979 ms https://explain.depesz.com/s/vT02

4. COUNT DISTINCT failed per user_id

  • no period
explain ```sql explain SELECT COUNT(DISTINCT "deployments"."user_id") FROM "deployments" WHERE "deployments"."status" = 3 AND "deployments"."user_id" BETWEEN 0 AND 9999 ```

Before: Time: 22.331 min https://explain.depesz.com/s/OKI8

Filter: ((deployments.user_id >= 0) AND (deployments.user_id <= 9999))

After: Time: 12.364 ms https://explain.depesz.com/s/4PRq

  • period
explain ```sql explain SELECT COUNT(DISTINCT "deployments"."user_id") FROM "deployments" WHERE "deployments"."status" = 3 AND "deployments"."created_at" BETWEEN '2020-02-10 11:01:14.332176' AND '2020-03-09 11:01:14.332272' AND "deployments"."user_id" BETWEEN 0 AND 9999 ```

Before: Time: 2.792 min https://explain.depesz.com/s/SG5A

Filter: ((deployments.user_id >= 0) AND (deployments.user_id <= 9999) AND (deployments.status = 3))

After: Time: 9.637 ms https://explain.depesz.com/s/5nrt

5. MIN/MAX per user_id status success

  • no period
explain ```sql explain SELECT MIN("deployments"."user_id") FROM "deployments" WHERE "deployments"."status" = 2 ```

Before: Time: 24.389 min https://explain.depesz.com/s/Crss

After: Time: 0.365 ms https://explain.depesz.com/s/JE39

  • period
explain ```sql explain SELECT MIN("deployments"."user_id") FROM "deployments" WHERE "deployments"."status" = 2 AND "deployments"."created_at" BETWEEN '2020-02-10 11:54:39.194310' AND '2020-03-09 11:54:39.194373' ```
#### Before: Time: 45.222 s https://explain.depesz.com/s/ebWK #### After: Time: 0.642 ms https://explain.depesz.com/s/TfF2

6. DISTINCT COUNT per user_id status success

  • no period
explain ```sql EXPLAIN SELECT COUNT(DISTINCT "deployments"."user_id") FROM "deployments" WHERE "deployments"."status" = 2 AND "deployments"."user_id" BETWEEN 0 AND 9999 ```

Before: Time: 16.352 min https://explain.depesz.com/s/LxOK

After: Time: 35.899 ms https://explain.depesz.com/s/Kg8R

  • period
explain ```sql explain SELECT COUNT(DISTINCT "deployments"."user_id") FROM "deployments" WHERE "deployments"."status" = 2 AND "deployments"."created_at" BETWEEN '2020-02-10 11:54:39.194310' AND '2020-02-10 11:54:39.194310' AND "deployments"."user_id" BETWEEN 2 AND 10000 ```
#### Before: Time: 7.144 ms https://explain.depesz.com/s/uyHT Filter: ((deployments.user_id >= 0) AND (deployments.user_id <= 9999) AND (deployments.status = 2)) #### After: Time: 6.325 ms https://explain.depesz.com/s/T1EY Filter: ((deployments.user_id >= 2) AND (deployments.user_id <= 10000) AND (deployments.status = 2))

7. COUNT Deployments

explain ```sql explain SELECT COUNT("deployments"."id") FROM "deployments" WHERE "deployments"."id" BETWEEN 1 AND 100000 ```

Before: Time: 215.300 ms https://explain.depesz.com/s/GoEE

After: Time: 52.250 ms https://explain.depesz.com/s/4qzn

8. MIN/MAX Deployments

explain ```sql explain SELECT MIN("deployments"."id") FROM "deployments" ```

Before: Time: 0.442 ms https://explain.depesz.com/s/xLnh

After: Time: 0.317 ms https://explain.depesz.com/s/p7Zi

9. MIN/MAX Deployments success

explain ```sql explain SELECT MIN("deployments"."id") FROM "deployments" WHERE "deployments"."status" = 2 ```

Before: Time: 0.346 ms https://explain.depesz.com/s/nFek

Filter: (deployments.status = 2)

After: Time: 0.355 ms https://explain.depesz.com/s/luv2

10. COUNT Deployments failed

explain ```sql explain SELECT COUNT("deployments"."id") FROM "deployments" WHERE "deployments"."status" = 2 AND "deployments"."id" BETWEEN 0 AND 99999 ```

Before: Time: 9.215 s https://explain.depesz.com/s/mM0r

Filter: (deployments.status = 2)

After: Time: 34.054 ms https://explain.depesz.com/s/v2Rp

11. MIN/MAX Deployments failed

explain ```sql explain SELECT MIN("deployments"."id") FROM "deployments" WHERE "deployments"."status" = 3 ```

Before: Time: 30.274 s https://explain.depesz.com/s/r0CC

Filter: (deployments.status = 3)

After: Time: 427.971 ms https://explain.depesz.com/s/gR8f

12. COUNT Deployments success

explain ```sql explain SELECT COUNT("deployments"."id") FROM "deployments" WHERE "deployments"."status" = 3 AND "deployments"."id" BETWEEN 0 AND 99999 ```

Before: Time: 300.198 ms https://explain.depesz.com/s/5e3i

Filter: (deployments.status = 3)

After: Time: 5.544 ms https://explain.depesz.com/s/K6Wc

Migration output

== 20200311084025 AddIndexOnUserIdStatusCreatedAtToDeployments: migrating =====
-- transaction_open?()
   -> 0.0000s
-- index_exists?(:deployments, [:user_id, :status, :created_at], {:algorithm=>:concurrently})
   -> 0.0080s
-- execute("SET statement_timeout TO 0")
   -> 0.0005s
-- add_index(:deployments, [:user_id, :status, :created_at], {:algorithm=>:concurrently})
   -> 0.0162s
-- execute("RESET ALL")
   -> 0.0004s
== 20200311084025 AddIndexOnUserIdStatusCreatedAtToDeployments: migrated (0.0254s)

== 20200311094020 AddIndexOnIdAndStatusToDeployments: migrating ===============
-- transaction_open?()
   -> 0.0000s
-- index_exists?(:deployments, [:id, :status], {:algorithm=>:concurrently})
   -> 0.0052s
-- execute("SET statement_timeout TO 0")
   -> 0.0005s
-- add_index(:deployments, [:id, :status], {:algorithm=>:concurrently})
   -> 0.0026s
-- execute("RESET ALL")
   -> 0.0004s
== 20200311094020 AddIndexOnIdAndStatusToDeployments: migrated (0.0091s) ======

Does this MR meet the acceptance criteria?

Conformity

Availability and Testing

Security

If this MR contains changes to processing or storing of credentials or tokens, authorization and authentication methods and other items described in the security review guidelines:

  • Label as security and @ mention @gitlab-com/gl-security/appsec
  • The MR includes necessary changes to maintain consistency between UI, API, email, or other methods
  • Security reports checked/validated by a reviewer from the AppSec team

Closes #208886 (closed)

cc @dstull @a_akgun @jeromezng

Edited by Alina Mihaila

Merge request reports

Loading