Skip to content

Add index to package files on package_id, status and file extension

What does this MR do and why?

Add index to package files on package_id, status and file extension

Adding the index to package files table on package_id, status and file extension will improve the perfomance of NuGet metadata endpoints and can be used instead of existing GIN index.

MR acceptance checklist

Please evaluate this MR against the MR acceptance checklist. It helps you analyze changes to reduce risks in quality, performance, reliability, security, and maintainability.

Screenshots or screen recordings

No.

How to set up and validate locally

Migration up and down:

$ rails db:migrate:main
main: == [advisory_lock_connection] object_id: 125540, pg_backend_pid: 9735
main: == 20240523145415 AddPackageIdFileExtensionStatusIndexToPackageFiles: migrating 
main: -- transaction_open?(nil)
main:    -> 0.0000s
main: -- view_exists?(:postgres_partitions)
main:    -> 0.0089s
main: -- index_exists?(:packages_package_files, :package_id, {:where=>"((status = 0) AND (reverse(split_part(reverse(file_name), '.', 1)) = 'nupkg'))", :name=>"index_packages_package_files_on_package_file_extension_status", :algorithm=>:concurrently})
main:    -> 0.0055s
main: -- execute("SET statement_timeout TO 0")
main:    -> 0.0002s
main: -- add_index(:packages_package_files, :package_id, {:where=>"((status = 0) AND (reverse(split_part(reverse(file_name), '.', 1)) = 'nupkg'))", :name=>"index_packages_package_files_on_package_file_extension_status", :algorithm=>:concurrently})
main:    -> 0.0116s
main: -- execute("RESET statement_timeout")
main:    -> 0.0003s
main: == 20240523145415 AddPackageIdFileExtensionStatusIndexToPackageFiles: migrated (0.0389s) 

main: == [advisory_lock_connection] object_id: 125540, pg_backend_pid: 9735
$ rails db:rollback:main                                                                           
main: == [advisory_lock_connection] object_id: 125140, pg_backend_pid: 10033
main: == 20240523145415 AddPackageIdFileExtensionStatusIndexToPackageFiles: reverting 
main: -- transaction_open?(nil)
main:    -> 0.0000s
main: -- view_exists?(:postgres_partitions)
main:    -> 0.0088s
main: -- indexes(:packages_package_files)
main:    -> 0.0055s
main: -- execute("SET statement_timeout TO 0")
main:    -> 0.0001s
main: -- remove_index(:packages_package_files, {:algorithm=>:concurrently, :name=>"index_packages_package_files_on_package_file_extension_status"})
main:    -> 0.0017s
main: -- execute("RESET statement_timeout")
main:    -> 0.0003s
main: == 20240523145415 AddPackageIdFileExtensionStatusIndexToPackageFiles: reverted (0.0277s) 

main: == [advisory_lock_connection] object_id: 125140, pg_backend_pid: 10033

Database analysis

The SQL query is triggered here.

Now

Query
SELECT "packages_package_files".*
FROM "packages_package_files"
WHERE "packages_package_files"."status" = 0
  AND "packages_package_files"."file_name" ILIKE '%.nupkg'
  AND "packages_package_files"."package_id" IN (XXX, XXX);
Explain plan
 Bitmap Heap Scan on public.packages_package_files  (cost=17012.57..18336.46 rows=847 width=987) (actual time=7805.145..8487.639 rows=525 loops=1)
   Filter: (packages_package_files.status = 0)
   Rows Removed by Filter: 0
   Buffers: shared hit=15076 read=9588 dirtied=17
   I/O Timings: read=6529.493 write=0.000
   ->  BitmapAnd  (cost=17011.90..17011.90 rows=847 width=0) (actual time=7801.980..7802.003 rows=0 loops=1)
         Buffers: shared hit=15076 read=9069
         I/O Timings: read=6516.918 write=0.000
         ->  Bitmap Index Scan using index_packages_package_files_on_package_id_and_created_at_desc  (cost=0.00..877.70 rows=43067 width=0) (actual time=375.270..375.275 rows=525 loops=1)
               Index Cond: (packages_package_files.package_id = ANY ('{XXX, XXX}'::bigint[]))
               Buffers: shared hit=698 read=376
               I/O Timings: read=372.219 write=0.000
         ->  Bitmap Index Scan using index_packages_package_files_on_file_name  (cost=0.00..16133.53 rows=1922537 width=0) (actual time=7417.837..7417.837 rows=2195530 loops=1)
               Index Cond: ((packages_package_files.file_name)::text ~~* '%.nupkg'::text)
               Buffers: shared hit=14378 read=8693
               I/O Timings: read=6144.699 write=0.000

https://console.postgres.ai/gitlab/gitlab-production-main/sessions/28353/commands/88608

After changing the query and adding the new index

postgres.ai setup
exec CREATE INDEX index_files_on_extension_status_package ON packages_package_files (package_id) WHERE ((status = 0) AND (split_part(reverse(file_name), '.', 1) = 'nupkg'));
exec VACUUM ANALYZE packages_package_files;
Query
SELECT "packages_package_files".*
FROM "packages_package_files"
WHERE "packages_package_files"."status" = 0
  AND (reverse(split_part(reverse(packages_package_files.file_name), '.', 1)) = 'nupkg')
  AND "packages_package_files"."package_id" IN (XXX, XXX);
Explain plan
 Index Scan using index_packages_package_files_on_package_file_extension_status on public.packages_package_files  (cost=0.43..654.62 rows=217 width=987) (actual time=4.256..524.642 rows=525 loops=1)
   Index Cond: (packages_package_files.package_id = ANY ('{XXX, XXX}'::bigint[]))
   Buffers: shared hit=640 read=687
   I/O Timings: read=517.327 write=0.000

https://console.postgres.ai/gitlab/gitlab-production-main/sessions/28574/commands/89119

Related to #462807 (closed)

Edited by Dzmitry Meshcharakou

Merge request reports