Skip to content

Added temporary index for uploads_size on project_statistics table

What does this MR do and why?

This MR adds a partial index on project_statistics table where uploads_size is not 0.

The index is required for the Backfill/trigger of storage_size of ProjectStatistics table, as it optimises the performance of the primary query used to filter ProjectStatistics to trigger storage_size_refresh!, as part of https://gitlab.com/gitlab-org/gitlab/-/issues/370313 issue.

Follow up MR that performs the Backfill: here

Screenshots or screen recordings

Screenshots are required for UI changes, and strongly recommended for all other merge requests.

Migration Output

Up

main: == 20221104074652 AddTempIndexForProjectStatisticsUploadSizeMigration: migrating
main: -- transaction_open?()
main:    -> 0.0000s
main: -- index_exists?(:project_statistics, [:project_id], {:name=>"tmp_index_project_statistics_uploads_size", :where=>"uploads_size <> 0", :algorithm=>:concurrently})
main:    -> 0.0037s
main: -- add_index(:project_statistics, [:project_id], {:name=>"tmp_index_project_statistics_uploads_size", :where=>"uploads_size <> 0", :algorithm=>:concurrently})
main:    -> 0.0016s
main: == 20221104074652 AddTempIndexForProjectStatisticsUploadSizeMigration: migrated (0.0070s)

Down

main: == 20221104074652 AddTempIndexForProjectStatisticsUploadSizeMigration: reverting
main: -- transaction_open?()
main:    -> 0.0000s
main: -- indexes(:project_statistics)
main:    -> 0.0060s
main: -- execute("SET statement_timeout TO 0")
main:    -> 0.0002s
main: -- remove_index(:project_statistics, {:algorithm=>:concurrently, :name=>"tmp_index_project_statistics_uploads_size"})
main:    -> 0.0010s
main: -- execute("RESET statement_timeout")
main:    -> 0.0002s
main: == 20221104074652 AddTempIndexForProjectStatisticsUploadSizeMigration: reverted (0.0118s)

Query Plans

Query:

SELECT "project_statistics".project_id FROM "project_statistics" WHERE (uploads_size <> 0)

Number of Affected Rows: 1155675

Without the index the performance of the query is very slow.

Before Index

Link: https://console.postgres.ai/gitlab/gitlab-production-tunnel-pg12/sessions/12979/commands/45585

Plan

Gather  (cost=1000.00..2283590.20 rows=1146767 width=132) (actual time=0.797..49929.177 rows=1161472 loops=1)
   Workers Planned: 2
   Workers Launched: 2
   Buffers: shared hit=27719 read=478940 dirtied=8105 written=7680
   I/O Timings: read=112683.792 write=335.706
   ->  Parallel Seq Scan on public.project_statistics  (cost=0.00..2167913.50 rows=477820 width=132) (actual time=0.402..49436.833 rows=387157 loops=3)
         Filter: (project_statistics.uploads_size <> 0)
         Rows Removed by Filter: 8724840
         Buffers: shared hit=27719 read=478940 dirtied=8105 written=7680
         I/O Timings: read=112683.792 write=335.706

Recommendations

SeqScan is used – Consider adding an index <https://docs.gitlab.com/ee/development/understanding_explain_plans.html#optimising-queries|Show details> ️ Query processes too much data to return a relatively small number of rows. – Reduce data cardinality as early as possible during the execution, using one or several of the following techniques: new indexes, partitioning, query rewriting, denormalization. See the visualization of the plan to understand which plan nodes are the main bottlenecks. <http://momjian.us/main/writings/pgsql/hw_performance/|Show details> ️ Add LIMIT – The number of rows in the result set is too big. Limit number of rows. <https://postgres.ai/#tip-add-limit|Show details>

After Index

CREATE INDEX index_project_statistics_nonzero_uploads_size ON project_statistics USING btree (project_id) WHERE (uploads_size <> 0);

Link: https://console.postgres.ai/gitlab/gitlab-production-tunnel-pg12/sessions/12979/commands/45587

Plan

Index Scan using index_project_statistics_nonzero_uploads_size on public.project_statistics  (cost=0.43..639099.47 rows=1155675 width=132) (actual time=0.386..15179.883 rows=1161472 loops=1)
   Buffers: shared hit=717388 read=446380 dirtied=12516 written=4908
   I/O Timings: read=12451.101 write=149.956

MR acceptance checklist

This checklist encourages us to confirm any changes have been analyzed to reduce risks in quality, performance, reliability, security, and maintainability.

Edited by Suraj Tripathi

Merge request reports