Add index to package files on package_id, status and file extension
requested to merge 462807-add-index-with-nupkg-file-extension-to-the-package-files-table into master
What does this MR do and why?
Add index to package files on package_id, status and file extension
Adding the index to package files table on package_id, status and file extension will improve the perfomance of NuGet metadata endpoints and can be used instead of existing GIN index.
MR acceptance checklist
Please evaluate this MR against the MR acceptance checklist. It helps you analyze changes to reduce risks in quality, performance, reliability, security, and maintainability.
Screenshots or screen recordings
No.
How to set up and validate locally
Migration up and down:
$ rails db:migrate:main
main: == [advisory_lock_connection] object_id: 125540, pg_backend_pid: 9735
main: == 20240523145415 AddPackageIdFileExtensionStatusIndexToPackageFiles: migrating
main: -- transaction_open?(nil)
main: -> 0.0000s
main: -- view_exists?(:postgres_partitions)
main: -> 0.0089s
main: -- index_exists?(:packages_package_files, :package_id, {:where=>"((status = 0) AND (reverse(split_part(reverse(file_name), '.', 1)) = 'nupkg'))", :name=>"index_packages_package_files_on_package_file_extension_status", :algorithm=>:concurrently})
main: -> 0.0055s
main: -- execute("SET statement_timeout TO 0")
main: -> 0.0002s
main: -- add_index(:packages_package_files, :package_id, {:where=>"((status = 0) AND (reverse(split_part(reverse(file_name), '.', 1)) = 'nupkg'))", :name=>"index_packages_package_files_on_package_file_extension_status", :algorithm=>:concurrently})
main: -> 0.0116s
main: -- execute("RESET statement_timeout")
main: -> 0.0003s
main: == 20240523145415 AddPackageIdFileExtensionStatusIndexToPackageFiles: migrated (0.0389s)
main: == [advisory_lock_connection] object_id: 125540, pg_backend_pid: 9735
$ rails db:rollback:main
main: == [advisory_lock_connection] object_id: 125140, pg_backend_pid: 10033
main: == 20240523145415 AddPackageIdFileExtensionStatusIndexToPackageFiles: reverting
main: -- transaction_open?(nil)
main: -> 0.0000s
main: -- view_exists?(:postgres_partitions)
main: -> 0.0088s
main: -- indexes(:packages_package_files)
main: -> 0.0055s
main: -- execute("SET statement_timeout TO 0")
main: -> 0.0001s
main: -- remove_index(:packages_package_files, {:algorithm=>:concurrently, :name=>"index_packages_package_files_on_package_file_extension_status"})
main: -> 0.0017s
main: -- execute("RESET statement_timeout")
main: -> 0.0003s
main: == 20240523145415 AddPackageIdFileExtensionStatusIndexToPackageFiles: reverted (0.0277s)
main: == [advisory_lock_connection] object_id: 125140, pg_backend_pid: 10033
Database analysis
The SQL query is triggered here.
Now
Query
SELECT "packages_package_files".*
FROM "packages_package_files"
WHERE "packages_package_files"."status" = 0
AND "packages_package_files"."file_name" ILIKE '%.nupkg'
AND "packages_package_files"."package_id" IN (XXX, XXX);
Explain plan
Bitmap Heap Scan on public.packages_package_files (cost=17012.57..18336.46 rows=847 width=987) (actual time=7805.145..8487.639 rows=525 loops=1)
Filter: (packages_package_files.status = 0)
Rows Removed by Filter: 0
Buffers: shared hit=15076 read=9588 dirtied=17
I/O Timings: read=6529.493 write=0.000
-> BitmapAnd (cost=17011.90..17011.90 rows=847 width=0) (actual time=7801.980..7802.003 rows=0 loops=1)
Buffers: shared hit=15076 read=9069
I/O Timings: read=6516.918 write=0.000
-> Bitmap Index Scan using index_packages_package_files_on_package_id_and_created_at_desc (cost=0.00..877.70 rows=43067 width=0) (actual time=375.270..375.275 rows=525 loops=1)
Index Cond: (packages_package_files.package_id = ANY ('{XXX, XXX}'::bigint[]))
Buffers: shared hit=698 read=376
I/O Timings: read=372.219 write=0.000
-> Bitmap Index Scan using index_packages_package_files_on_file_name (cost=0.00..16133.53 rows=1922537 width=0) (actual time=7417.837..7417.837 rows=2195530 loops=1)
Index Cond: ((packages_package_files.file_name)::text ~~* '%.nupkg'::text)
Buffers: shared hit=14378 read=8693
I/O Timings: read=6144.699 write=0.000
https://console.postgres.ai/gitlab/gitlab-production-main/sessions/28353/commands/88608
After changing the query and adding the new index
postgres.ai setup
exec CREATE INDEX index_files_on_extension_status_package ON packages_package_files (package_id) WHERE ((status = 0) AND (split_part(reverse(file_name), '.', 1) = 'nupkg'));
exec VACUUM ANALYZE packages_package_files;
Query
SELECT "packages_package_files".*
FROM "packages_package_files"
WHERE "packages_package_files"."status" = 0
AND (reverse(split_part(reverse(packages_package_files.file_name), '.', 1)) = 'nupkg')
AND "packages_package_files"."package_id" IN (XXX, XXX);
Explain plan
Index Scan using index_packages_package_files_on_package_file_extension_status on public.packages_package_files (cost=0.43..654.62 rows=217 width=987) (actual time=4.256..524.642 rows=525 loops=1)
Index Cond: (packages_package_files.package_id = ANY ('{XXX, XXX}'::bigint[]))
Buffers: shared hit=640 read=687
I/O Timings: read=517.327 write=0.000
https://console.postgres.ai/gitlab/gitlab-production-main/sessions/28574/commands/89119
Related to #462807 (closed)
Edited by Dzmitry Meshcharakou