Skip to content

Speed up usage ping count for projects with incident SLAs enabled

Peter Leitzen requested to merge pl-incident-sla-usage-ping into master

What does this MR do and why?

This MR adds a partial index to project_incident_management_settings (more than 1000 rows on GitLab.com) on project_id with sla_timer = TRUE to speed up lookups for Usage Ping.

It also drop unnecessary JOIN and removes some specs for the scope.

See #268094 (closed).

database

Partial index

Up

main: == 20221011162637 AddPartialIndexProjectIncidentManagementSettingsOnProjectIdAndSlaTimer: migrating
main: -- transaction_open?()
main:    -> 0.0000s
main: -- index_exists?(:project_incident_management_settings, :project_id, {:name=>"index_project_incident_management_settings_on_p_id_sla_timer", :where=>"sla_timer = TRUE", :algorithm=>:concurrently})
main:    -> 0.0034s
main: -- execute("SET statement_timeout TO 0")
main:    -> 0.0003s
main: -- add_index(:project_incident_management_settings, :project_id, {:name=>"index_project_incident_management_settings_on_p_id_sla_timer", :where=>"sla_timer = TRUE", :algorithm=>:concurrently})
main:    -> 0.0015s
main: -- execute("RESET statement_timeout")
main:    -> 0.0002s
main: == 20221011162637 AddPartialIndexProjectIncidentManagementSettingsOnProjectIdAndSlaTimer: migrated (0.0119s)

Down

main: == 20221011162637 AddPartialIndexProjectIncidentManagementSettingsOnProjectIdAndSlaTimer: reverting
main: -- transaction_open?()
main:    -> 0.0000s
main: -- indexes(:project_incident_management_settings)
main:    -> 0.0047s
main: -- execute("SET statement_timeout TO 0")
main:    -> 0.0004s
main: -- remove_index(:project_incident_management_settings, {:algorithm=>:concurrently, :name=>"index_project_incident_management_settings_on_p_id_sla_timer"})
main:    -> 0.0010s
main: -- execute("RESET statement_timeout")
main:    -> 0.0003s
main: == 20221011162637 AddPartialIndexProjectIncidentManagementSettingsOnProjectIdAndSlaTimer: reverted (0.0121s)

Queries

Before

-- Gitlab::UsageData.count(::Project.with_enabled_incident_sla)
SELECT MIN("projects"."id") FROM "projects" INNER JOIN "project_incident_management_settings" ON "project_incident_management_settings"."project_id" = "projects"."id" WHERE "project_incident_management_settings"."sla_timer" = TRUE

SELECT MAX("projects"."id") FROM "projects" INNER JOIN "project_incident_management_settings" ON "project_incident_management_settings"."project_id" = "projects"."id" WHERE "project_incident_management_settings"."sla_timer" = TRUE

SELECT COUNT("projects"."id") FROM "projects" INNER JOIN "project_incident_management_settings" ON "project_incident_management_settings"."project_id" = "projects"."id" WHERE "project_incident_management_settings"."sla_timer" = TRUE AND "projects"."id" >= 0 AND "projects"."id" < 1

After

-- Gitlab::UsageData.count(::IncidentManagement::ProjectIncidentManagementSetting.where(sla_timer: true), :project_id)

SELECT MIN("project_incident_management_settings"."project_id") FROM "project_incident_management_settings" WHERE "project_incident_management_settings"."sla_timer" = TRUE

SELECT MAX("project_incident_management_settings"."project_id") FROM "project_incident_management_settings" WHERE "project_incident_management_settings"."sla_timer" = TRUE

SELECT COUNT("project_incident_management_settings"."project_id") FROM "project_incident_management_settings" WHERE "project_incident_management_settings"."sla_timer" = TRUE AND "project_incident_management_settings"."project_id" >= 0 AND "project_incident_management_settings"."project_id" < 1

🔬 Query plans

Creating index

exec CREATE INDEX index_project_incident_management_settings_on_p_id_sla_timer ON project_incident_management_settings USING btree (project_id) WHERE (sla_timer = TRUE);
Session: 12583
The query has been executed. Duration: 32.263 ms

MIN

SELECT MIN("project_incident_management_settings"."project_id") FROM "project_incident_management_settings" WHERE "project_incident_management_settings"."sla_timer" = TRUE
 Result  (cost=0.24..0.25 rows=1 width=4) (actual time=0.141..0.142 rows=1 loops=1)
   Buffers: shared hit=1 read=1
   I/O Timings: read=0.044 write=0.000
   InitPlan 1 (returns $0)
     ->  Limit  (cost=0.14..0.24 rows=1 width=4) (actual time=0.136..0.137 rows=1 loops=1)
           Buffers: shared hit=1 read=1
           I/O Timings: read=0.044 write=0.000
           ->  Index Only Scan using index_project_incident_management_settings_on_p_id_sla_timer on public.project_incident_management_settings  (cost=0.14..7.44 rows=77 width=4) (actual time=0.134..0.135 rows=1 loops=1)
                 Index Cond: (project_incident_management_settings.project_id IS NOT NULL)
                 Heap Fetches: 0
                 Buffers: shared hit=1 read=1
                 I/O Timings: read=0.044 write=0.000

Recommendations: Looks good

See https://console.postgres.ai/shared/95e8e2ca-8704-484f-86ec-01b36d3d92d7

MAX

SELECT MAX("project_incident_management_settings"."project_id") FROM "project_incident_management_settings" WHERE "project_incident_management_settings"."sla_timer" = TRUE
 Result  (cost=0.24..0.25 rows=1 width=4) (actual time=0.043..0.044 rows=1 loops=1)
   Buffers: shared hit=3
   I/O Timings: read=0.000 write=0.000
   InitPlan 1 (returns $0)
     ->  Limit  (cost=0.14..0.24 rows=1 width=4) (actual time=0.038..0.038 rows=1 loops=1)
           Buffers: shared hit=3
           I/O Timings: read=0.000 write=0.000
           ->  Index Only Scan using index_project_incident_management_settings_on_p_id_sla_timer on public.project_incident_management_settings  (cost=0.14..7.44 rows=77 width=4) (actual time=0.035..0.035 rows=1 loops=1)
                 Index Cond: (project_incident_management_settings.project_id IS NOT NULL)
                 Heap Fetches: 1
                 Buffers: shared hit=3
                 I/O Timings: read=0.000 write=0.000

Recommendations: VACUUM ANALYZE needed – Visibility map(s) for the table(s) involved in the query are outdated. For better performance: 1) run VACUUM ANALYZE on them as a one-time solution, 2) perform autovacuum tuning as a long-term permanent solution (tuning "vacuum" part of autovacuum).

See https://console.postgres.ai/shared/fa07dd7e-fdeb-45b9-9ad1-1a724301be78

COUNT

100000 is just an example.

SELECT COUNT("project_incident_management_settings"."project_id") FROM "project_incident_management_settings" WHERE "project_incident_management_settings"."sla_timer" = TRUE AND "project_incident_management_settings"."project_id" >= 0 AND "project_incident_management_settings"."project_id" < 100000
 Aggregate  (cost=3.16..3.17 rows=1 width=8) (actual time=0.012..0.013 rows=1 loops=1)
   Buffers: shared hit=1
   I/O Timings: read=0.000 write=0.000
   ->  Index Only Scan using index_project_incident_management_settings_on_p_id_sla_timer on public.project_incident_management_settings  (cost=0.14..3.16 rows=1 width=4) (actual time=0.009..0.009 rows=0 loops=1)
         Index Cond: ((project_incident_management_settings.project_id >= 0) AND (project_incident_management_settings.project_id < 100000))
         Heap Fetches: 0
         Buffers: shared hit=1
         I/O Timings: read=0.000 write=0.000

Recommendations: Looks good

See https://console.postgres.ai/shared/40523711-1ee8-4729-980c-3f5b063ea706

How to set up and validate locally

bundle exec rake db:migrate

# In rails console
Gitlab::UsageData.count(::IncidentManagement::ProjectIncidentManagementSetting.where(sla_timer: true), :project_id)

bundle exec rake db:rollback

MR acceptance checklist

This checklist encourages us to confirm any changes have been analyzed to reduce risks in quality, performance, reliability, security, and maintainability.

Edited by Peter Leitzen

Merge request reports