Optimize 28-days deployment counters
What does this MR do?
Optimize 28-days deployment counters with an index
Issue #221184 (closed)
Queries
deployments:
Gitlab::UsageData.count(Deployment.where(Gitlab::UsageData.last_28_days_time_period))
successful_deployments:
Gitlab::UsageData.count(Deployment.success.where(Gitlab::UsageData.last_28_days_time_period))
failed_deployments:
Gitlab::UsageData.count(Deployment.failed.where(Gitlab::UsageData.last_28_days_time_period))
Loading development environment (Rails 6.0.3.1)
[1] pry(main)> Gitlab::UsageData.count(Deployment.where(Gitlab::UsageData.last_28_days_time_period))
SELECT MIN("deployments"."id") FROM "deployments" WHERE "deployments"."created_at" BETWEEN '2020-06-06 15:58:29.145420' AND '2020-07-04 15:58:29.145503'
SELECT MAX("deployments"."id") FROM "deployments" WHERE "deployments"."created_at" BETWEEN '2020-06-06 15:58:29.145420' AND '2020-07-04 15:58:29.145503'
SELECT COUNT("deployments"."id") FROM "deployments" WHERE "deployments"."created_at" BETWEEN '2020-06-06 15:58:29.145420' AND '2020-07-04 15:58:29.145503' AND "deployments"."id" BETWEEN 0 AND 99999
- Before: https://explain.depesz.com/s/LcBw 164ms
- After: https://explain.depesz.com/s/tzD1 7.4ms (eliminated the filters)
[2] pry(main)> Gitlab::UsageData.count(Deployment.success.where(Gitlab::UsageData.last_28_days_time_period))
SELECT MIN("deployments"."id") FROM "deployments" WHERE "deployments"."status" = 2 AND "deployments"."created_at" BETWEEN '2020-06-06 15:58:49.249183' AND '2020-07-04 15:58:49.249234'
SELECT MAX("deployments"."id") FROM "deployments" WHERE "deployments"."status" = 2 AND "deployments"."created_at" BETWEEN '2020-06-06 15:58:49.249183' AND '2020-07-04 15:58:49.249234'
SELECT COUNT("deployments"."id") FROM "deployments" WHERE "deployments"."status" = 2 AND "deployments"."created_at" BETWEEN '2020-06-06 15:58:49.249183' AND '2020-07-04 15:58:49.249234' AND "deployments"."id" BETWEEN 0 AND 99999
Before: https://explain.depesz.com/s/zWxh 41ms After: https://explain.depesz.com/s/iTyC 9ms (eliminated the filters)
[3] pry(main)> Gitlab::UsageData.count(Deployment.failed.where(Gitlab::UsageData.last_28_days_time_period))
SELECT MIN("deployments"."id") FROM "deployments" WHERE "deployments"."status" = 3 AND "deployments"."created_at" BETWEEN '2020-06-06 15:58:55.562337' AND '2020-07-04 15:58:55.562511'
SELECT MAX("deployments"."id") FROM "deployments" WHERE "deployments"."status" = 3 AND "deployments"."created_at" BETWEEN '2020-06-06 15:58:55.562337' AND '2020-07-04 15:58:55.562511'
SELECT COUNT("deployments"."id") FROM "deployments" WHERE "deployments"."status" = 3 AND "deployments"."created_at" BETWEEN '2020-06-06 15:58:55.562337' AND '2020-07-04 15:58:55.562511' AND "deployments"."id" BETWEEN 0 AND 99999
Before: https://explain.depesz.com/s/A0HW 32ms After: https://explain.depesz.com/s/LnZU 7ms (eliminated the filters)
Optimization
DROP INDEX index_deployments_on_id_and_status
CREATE INDEX index_deployments_on_id_and_status_and_created_at ON public.deployments USING btree (id, status, created_at);
Index creation
exec CREATE INDEX index_deployments_on_id_and_status_and_created_at ON public.deployments USING btree (id, status, created_at);
The query has been executed. Duration: 21.164 min
\di+ index_deployments_on_id_and_status_and_created_at
Command output:
List of relations
Schema | Name | Type | Owner | Table | Size | Description
--------+---------------------------------------------------+-------+--------+-------------+---------+-------------
public | index_deployments_on_id_and_status_and_created_at | index | gitlab | deployments | 2209 MB |
Timing
- With 73million rows in deployments, and 100K batch size, and < 100ms for each batch we shall have 73 seconds for the counter
!26757 (merged)
Some other counters forSELECT MIN("deployments"."id") FROM "deployments"
SELECT MAX("deployments"."id") FROM "deployments"
SELECT COUNT("deployments"."id") FROM "deployments" WHERE "deployments"."id" BETWEEN 1 AND 100000
After: https://explain.depesz.com/s/XJdO
SELECT MIN("deployments"."id") FROM "deployments" WHERE "deployments"."status" = 3
SELECT MAX("deployments"."id") FROM "deployments" WHERE "deployments"."status" = 3
SELECT COUNT("deployments"."id") FROM "deployments" WHERE "deployments"."status" = 3 AND "deployments"."id" BETWEEN 1 AND 99999
After: https://explain.depesz.com/s/cUTt
SELECT MIN("deployments"."id") FROM "deployments" WHERE "deployments"."status" = 2
SELECT MAX("deployments"."id") FROM "deployments" WHERE "deployments"."status" = 2
SELECT COUNT("deployments"."id") FROM "deployments" WHERE "deployments"."status" = 2 AND "deployments"."id" BETWEEN 1 AND 99999
After: https://explain.depesz.com/s/tvsZ
Migration output
$ rails db:migrate:up VERSION=20200704161600
== 20200704161600 AddIndexOnIdAndStatusAndCreatedAtToDeployments: migrating ===
-- transaction_open?()
-> 0.0000s
-- index_exists?(:deployments, [:id, :status, :created_at], {:algorithm=>:concurrently})
-> 0.0055s
-- add_index(:deployments, [:id, :status, :created_at], {:algorithm=>:concurrently})
-> 0.0299s
-- transaction_open?()
-> 0.0000s
-- index_exists?(:deployments, [:id, :status], {:algorithm=>:concurrently})
-> 0.0053s
-- remove_index(:deployments, {:algorithm=>:concurrently, :column=>[:id, :status]})
-> 0.0165s
== 20200704161600 AddIndexOnIdAndStatusAndCreatedAtToDeployments: migrated (0.0578s)
$ rails db:migrate:down VERSION=20200704161600
== 20200704161600 AddIndexOnIdAndStatusAndCreatedAtToDeployments: reverting ===
-- transaction_open?()
-> 0.0000s
-- index_exists?(:deployments, [:id, :status], {:algorithm=>:concurrently})
-> 0.0056s
-- add_index(:deployments, [:id, :status], {:algorithm=>:concurrently})
-> 0.0210s
-- transaction_open?()
-> 0.0000s
-- index_exists?(:deployments, [:id, :status, :created_at], {:algorithm=>:concurrently})
-> 0.0048s
-- remove_index(:deployments, {:algorithm=>:concurrently, :column=>[:id, :status, :created_at]})
-> 0.0065s
== 20200704161600 AddIndexOnIdAndStatusAndCreatedAtToDeployments: reverted (0.0386s)
Edited by Alper Akgun