Add index to package files on package_id and created_at DESC
🍀 Context
The large SaaS customer experience db timeouts or cancellation when downloading Maven files:
GET /api/:version/projects/:id/packages/maven/*path/:file_name
The log entries with 500
errors: link (internal)
The endpoint itself has ~2.5K
violations over specified threshold in the last 7 days: link (internal)
The log entries with the long running SQL queries (> 3 seconds): link (internal)
🔨 What does this MR do and why?
Add a new index to the packages_package_files
table for package_id
and created_at
columns.
📸 Screenshots or screen recordings
None
How to set up and validate locally
None
💾 Database review
⤴ Migration up
$ rake db:migration:main
main: == 20230321153035 AddPackageIdCreatedAtDescIndexToPackageFiles: migrating =====
main: -- transaction_open?()
main: -> 0.0001s
main: -- view_exists?(:postgres_partitions)
main: -> 0.0677s
main: -- index_exists?(:packages_package_files, "package_id, created_at DESC", {:name=>"index_packages_package_files_on_package_id_and_created_at_desc", :algorithm=>:concurrently})
main: -> 0.0070s
main: -- execute("SET statement_timeout TO 0")
main: -> 0.0006s
main: -- add_index(:packages_package_files, "package_id, created_at DESC", {:name=>"index_packages_package_files_on_package_id_and_created_at_desc", :algorithm=>:concurrently})
main: -> 0.0062s
main: -- execute("RESET statement_timeout")
main: -> 0.0004s
main: == 20230321153035 AddPackageIdCreatedAtDescIndexToPackageFiles: migrated (0.1080s)
⤵ Migration down
$ rake db:rollback:main
main: == 20230321153035 AddPackageIdCreatedAtDescIndexToPackageFiles: reverting =====
main: -- transaction_open?()
main: -> 0.0000s
main: -- view_exists?(:postgres_partitions)
main: -> 0.1010s
main: -- indexes(:packages_package_files)
main: -> 0.0052s
main: -- current_schema()
main: -> 0.0001s
main: == 20230321153035 AddPackageIdCreatedAtDescIndexToPackageFiles: reverted (0.1154s)
🔬 Database analysis
🔥 Now
Query
EXPLAIN SELECT
"packages_packages".*
FROM
"packages_packages"
INNER JOIN "packages_maven_metadata" ON "packages_maven_metadata"."package_id" = "packages_packages"."id"
INNER JOIN "packages_package_files" ON "packages_package_files"."package_id" = "packages_packages"."id"
WHERE
"packages_packages"."project_id" = XXXX
AND "packages_packages"."status" IN (0, 1)
AND "packages_maven_metadata"."path" = XXXX
ORDER BY
packages_package_files.created_at DESC
LIMIT 1
Explain plan
Limit (cost=77.30..77.30 rows=1 width=100) (actual time=55294.393..55294.402 rows=1 loops=1)
Buffers: shared hit=1468 read=31083 dirtied=503
I/O Timings: read=54277.152 write=0.000
-> Sort (cost=77.30..77.31 rows=6 width=100) (actual time=55294.389..55294.397 rows=1 loops=1)
Sort Key: packages_package_files.created_at DESC
Sort Method: top-N heapsort Memory: 25kB
Buffers: shared hit=1468 read=31083 dirtied=503
I/O Timings: read=54277.152 write=0.000
-> Nested Loop (cost=1.56..77.27 rows=6 width=100) (actual time=51.976..55121.105 rows=33352 loops=1)
Buffers: shared hit=1465 read=31083 dirtied=503
I/O Timings: read=54277.152 write=0.000
-> Nested Loop (cost=0.99..7.03 rows=1 width=100) (actual time=39.594..39.623 rows=1 loops=1)
Buffers: shared hit=1 read=13
I/O Timings: read=39.285 write=0.000
-> Index Scan using index_packages_maven_metadata_on_path on public.packages_maven_metadata (cost=0.56..3.58 rows=1 width=8) (actual time=16.355..20.822 rows=2 loops=1)
Index Cond: ((packages_maven_metadata.path)::text = 'com/tmobile/cdp/templates-gradle'::text)
Buffers: shared read=6
I/O Timings: read=20.688 write=0.000
-> Index Scan using packages_packages_pkey on public.packages_packages (cost=0.43..3.46 rows=1 width=92) (actual time=9.380..9.381 rows=0 loops=2)
Index Cond: (packages_packages.id = packages_maven_metadata.package_id)
Filter: ((packages_packages.status = ANY ('{0,1}'::integer[])) AND (packages_packages.project_id = 11160640))
Rows Removed by Filter: 0
Buffers: shared hit=1 read=7
I/O Timings: read=18.597 write=0.000
-> Index Scan using index_packages_package_files_on_package_id_id on public.packages_package_files (cost=0.56..68.38 rows=148 width=16) (actual time=12.372..54979.124 rows=33352 loops=1)
Index Cond: (packages_package_files.package_id = packages_packages.id)
Buffers: shared hit=1464 read=31070 dirtied=503
I/O Timings: read=54237.867 write=0.000
https://console.postgres.ai/gitlab/gitlab-production-tunnel-pg12/sessions/16720/commands/57046
🚒 After adding the index
pg.ai setup
reset
exec CREATE INDEX foobar ON packages_package_files USING btree (package_id, created_at DESC)
exec VACUUM ANALYZE packages_package_files
Query
SELECT
"packages_packages".*
FROM
"packages_packages"
INNER JOIN "packages_maven_metadata" ON "packages_maven_metadata"."package_id" = "packages_packages"."id"
INNER JOIN "packages_package_files" ON "packages_package_files"."package_id" = "packages_packages"."id"
WHERE
"packages_packages"."project_id" = XXXX
AND "packages_packages"."status" IN (0, 1)
AND "packages_maven_metadata"."path" = XXXX
ORDER BY
packages_package_files.created_at DESC
LIMIT 1
Execution plan
Limit (cost=13.17..13.17 rows=1 width=101) (actual time=62.117..62.120 rows=1 loops=1)
Buffers: shared hit=1916 read=143
I/O Timings: read=39.432 write=0.000
-> Sort (cost=13.17..13.19 rows=6 width=101) (actual time=62.115..62.117 rows=1 loops=1)
Sort Key: packages_package_files.created_at DESC
Sort Method: top-N heapsort Memory: 25kB
Buffers: shared hit=1916 read=143
I/O Timings: read=39.432 write=0.000
-> Nested Loop (cost=1.56..13.14 rows=6 width=101) (actual time=31.408..51.274 rows=33352 loops=1)
Buffers: shared hit=1913 read=143
I/O Timings: read=39.432 write=0.000
-> Nested Loop (cost=0.99..7.03 rows=1 width=101) (actual time=31.204..31.208 rows=1 loops=1)
Buffers: shared hit=1 read=13
I/O Timings: read=30.968 write=0.000
-> Index Scan using index_packages_maven_metadata_on_path on public.packages_maven_metadata (cost=0.56..3.58 rows=1 width=8) (actual time=9.215..11.194 rows=2 loops=1)
Index Cond: ((packages_maven_metadata.path)::text = 'com/tmobile/cdp/templates-gradle'::text)
Buffers: shared read=6
I/O Timings: read=11.092 write=0.000
-> Index Scan using packages_packages_pkey on public.packages_packages (cost=0.43..3.46 rows=1 width=93) (actual time=9.996..9.996 rows=0 loops=2)
Index Cond: (packages_packages.id = packages_maven_metadata.package_id)
Filter: ((packages_packages.status = ANY ('{0,1}'::integer[])) AND (packages_packages.project_id = 11160640))
Rows Removed by Filter: 0
Buffers: shared hit=1 read=7
I/O Timings: read=19.876 write=0.000
-> Index Only Scan using foobar on public.packages_package_files (cost=0.56..4.64 rows=147 width=16) (actual time=0.196..14.235 rows=33352 loops=1)
Index Cond: (packages_package_files.package_id = packages_maven_metadata.package_id)
Heap Fetches: 0
Buffers: shared hit=1912 read=130
I/O Timings: read=8.463 write=0.000
https://console.postgres.ai/gitlab/gitlab-production-tunnel-pg12/sessions/16772/commands/57160
🔍 Observation
The usage Index-Only Scan
answers the query from index only without accessing the packages_package_files
table.
Shared buffers reads
went down: ~242 MiB
-> ~1.1 MiB
.
The index size is 1785 MB
. Well, the table has ~60M
rows.
MR acceptance checklist
This checklist encourages us to confirm any changes have been analyzed to reduce risks in quality, performance, reliability, security, and maintainability.
-
I have evaluated the MR acceptance checklist for this MR.
Related to #396650 (closed)