Speed up usage ping count for projects with incident SLAs enabled (!100718) · Merge requests · GitLab.org / GitLab

Peter Leitzen requested to merge pl-incident-sla-usage-ping into master Oct 11, 2022

What does this MR do and why?

This MR adds a partial index to project_incident_management_settings (more than 1000 rows on GitLab.com) on project_id with sla_timer = TRUE to speed up lookups for Usage Ping.

It also drop unnecessary JOIN and removes some specs for the scope.

See #268094 (closed).

database

Partial index

Up

main: == 20221011162637 AddPartialIndexProjectIncidentManagementSettingsOnProjectIdAndSlaTimer: migrating
main: -- transaction_open?()
main:    -> 0.0000s
main: -- index_exists?(:project_incident_management_settings, :project_id, {:name=>"index_project_incident_management_settings_on_p_id_sla_timer", :where=>"sla_timer = TRUE", :algorithm=>:concurrently})
main:    -> 0.0034s
main: -- execute("SET statement_timeout TO 0")
main:    -> 0.0003s
main: -- add_index(:project_incident_management_settings, :project_id, {:name=>"index_project_incident_management_settings_on_p_id_sla_timer", :where=>"sla_timer = TRUE", :algorithm=>:concurrently})
main:    -> 0.0015s
main: -- execute("RESET statement_timeout")
main:    -> 0.0002s
main: == 20221011162637 AddPartialIndexProjectIncidentManagementSettingsOnProjectIdAndSlaTimer: migrated (0.0119s)

Down

main: == 20221011162637 AddPartialIndexProjectIncidentManagementSettingsOnProjectIdAndSlaTimer: reverting
main: -- transaction_open?()
main:    -> 0.0000s
main: -- indexes(:project_incident_management_settings)
main:    -> 0.0047s
main: -- execute("SET statement_timeout TO 0")
main:    -> 0.0004s
main: -- remove_index(:project_incident_management_settings, {:algorithm=>:concurrently, :name=>"index_project_incident_management_settings_on_p_id_sla_timer"})
main:    -> 0.0010s
main: -- execute("RESET statement_timeout")
main:    -> 0.0003s
main: == 20221011162637 AddPartialIndexProjectIncidentManagementSettingsOnProjectIdAndSlaTimer: reverted (0.0121s)

Queries

Before

-- Gitlab::UsageData.count(::Project.with_enabled_incident_sla)
SELECT MIN("projects"."id") FROM "projects" INNER JOIN "project_incident_management_settings" ON "project_incident_management_settings"."project_id" = "projects"."id" WHERE "project_incident_management_settings"."sla_timer" = TRUE

SELECT MAX("projects"."id") FROM "projects" INNER JOIN "project_incident_management_settings" ON "project_incident_management_settings"."project_id" = "projects"."id" WHERE "project_incident_management_settings"."sla_timer" = TRUE

SELECT COUNT("projects"."id") FROM "projects" INNER JOIN "project_incident_management_settings" ON "project_incident_management_settings"."project_id" = "projects"."id" WHERE "project_incident_management_settings"."sla_timer" = TRUE AND "projects"."id" >= 0 AND "projects"."id" < 1

After

-- Gitlab::UsageData.count(::IncidentManagement::ProjectIncidentManagementSetting.where(sla_timer: true), :project_id)

SELECT MIN("project_incident_management_settings"."project_id") FROM "project_incident_management_settings" WHERE "project_incident_management_settings"."sla_timer" = TRUE

SELECT MAX("project_incident_management_settings"."project_id") FROM "project_incident_management_settings" WHERE "project_incident_management_settings"."sla_timer" = TRUE

SELECT COUNT("project_incident_management_settings"."project_id") FROM "project_incident_management_settings" WHERE "project_incident_management_settings"."sla_timer" = TRUE AND "project_incident_management_settings"."project_id" >= 0 AND "project_incident_management_settings"."project_id" < 1

🔬 Query plans

Creating index

exec CREATE INDEX index_project_incident_management_settings_on_p_id_sla_timer ON project_incident_management_settings USING btree (project_id) WHERE (sla_timer = TRUE);
Session: 12583
The query has been executed. Duration: 32.263 ms

MIN

SELECT MIN("project_incident_management_settings"."project_id") FROM "project_incident_management_settings" WHERE "project_incident_management_settings"."sla_timer" = TRUE

 Result  (cost=0.24..0.25 rows=1 width=4) (actual time=0.141..0.142 rows=1 loops=1)
   Buffers: shared hit=1 read=1
   I/O Timings: read=0.044 write=0.000
   InitPlan 1 (returns $0)
     ->  Limit  (cost=0.14..0.24 rows=1 width=4) (actual time=0.136..0.137 rows=1 loops=1)
           Buffers: shared hit=1 read=1
           I/O Timings: read=0.044 write=0.000
           ->  Index Only Scan using index_project_incident_management_settings_on_p_id_sla_timer on public.project_incident_management_settings  (cost=0.14..7.44 rows=77 width=4) (actual time=0.134..0.135 rows=1 loops=1)
                 Index Cond: (project_incident_management_settings.project_id IS NOT NULL)
                 Heap Fetches: 0
                 Buffers: shared hit=1 read=1
                 I/O Timings: read=0.044 write=0.000

Recommendations: ✅ Looks good

See https://console.postgres.ai/shared/95e8e2ca-8704-484f-86ec-01b36d3d92d7

MAX

SELECT MAX("project_incident_management_settings"."project_id") FROM "project_incident_management_settings" WHERE "project_incident_management_settings"."sla_timer" = TRUE

 Result  (cost=0.24..0.25 rows=1 width=4) (actual time=0.043..0.044 rows=1 loops=1)
   Buffers: shared hit=3
   I/O Timings: read=0.000 write=0.000
   InitPlan 1 (returns $0)
     ->  Limit  (cost=0.14..0.24 rows=1 width=4) (actual time=0.038..0.038 rows=1 loops=1)
           Buffers: shared hit=3
           I/O Timings: read=0.000 write=0.000
           ->  Index Only Scan using index_project_incident_management_settings_on_p_id_sla_timer on public.project_incident_management_settings  (cost=0.14..7.44 rows=77 width=4) (actual time=0.035..0.035 rows=1 loops=1)
                 Index Cond: (project_incident_management_settings.project_id IS NOT NULL)
                 Heap Fetches: 1
                 Buffers: shared hit=3
                 I/O Timings: read=0.000 write=0.000

Recommendations: ❗ VACUUM ANALYZE needed – Visibility map(s) for the table(s) involved in the query are outdated. For better performance: 1) run VACUUM ANALYZE on them as a one-time solution, 2) perform autovacuum tuning as a long-term permanent solution (tuning "vacuum" part of autovacuum).

See https://console.postgres.ai/shared/fa07dd7e-fdeb-45b9-9ad1-1a724301be78

COUNT

⚠ 100000 is just an example.

SELECT COUNT("project_incident_management_settings"."project_id") FROM "project_incident_management_settings" WHERE "project_incident_management_settings"."sla_timer" = TRUE AND "project_incident_management_settings"."project_id" >= 0 AND "project_incident_management_settings"."project_id" < 100000

 Aggregate  (cost=3.16..3.17 rows=1 width=8) (actual time=0.012..0.013 rows=1 loops=1)
   Buffers: shared hit=1
   I/O Timings: read=0.000 write=0.000
   ->  Index Only Scan using index_project_incident_management_settings_on_p_id_sla_timer on public.project_incident_management_settings  (cost=0.14..3.16 rows=1 width=4) (actual time=0.009..0.009 rows=0 loops=1)
         Index Cond: ((project_incident_management_settings.project_id >= 0) AND (project_incident_management_settings.project_id < 100000))
         Heap Fetches: 0
         Buffers: shared hit=1
         I/O Timings: read=0.000 write=0.000

Recommendations: ✅ Looks good

See https://console.postgres.ai/shared/40523711-1ee8-4729-980c-3f5b063ea706

How to set up and validate locally

bundle exec rake db:migrate

# In rails console
Gitlab::UsageData.count(::IncidentManagement::ProjectIncidentManagementSetting.where(sla_timer: true), :project_id)

bundle exec rake db:rollback

MR acceptance checklist

This checklist encourages us to confirm any changes have been analyzed to reduce risks in quality, performance, reliability, security, and maintainability.

I have evaluated the MR acceptance checklist for this MR.

Edited Oct 11, 2022 by Peter Leitzen

Speed up usage ping count for projects with incident SLAs enabled