Skip to content

Add index to installable npm packages

🍀 What does this MR do and why?

This MR is part of the effort that contributes towards Slow SQL queries for the NPM metadata endpoint and improves the performance of SQL queries triggered by Packages::Npm::PackageFinder by introducing a new index.

💾 Database review

Migration up

$ rake db:migrate:main 
main: == 20230131125844 AddProjectIdNameIdVersionIndexToInstallableNpmPackages: migrating 
main: -- transaction_open?()
main:    -> 0.0000s
main: -- view_exists?(:postgres_partitions)
main:    -> 0.6433s
main: -- index_exists?(:packages_packages, [:project_id, :name, :id, :version], {:name=>"idx_packages_on_project_id_name_id_version_when_installable_npm", :where=>"package_type = 2 AND status IN (0, 1)", :algorithm=>:concurrently})
main:    -> 0.0165s
main: -- execute("SET statement_timeout TO 0")
main:    -> 0.0007s
main: -- add_index(:packages_packages, [:project_id, :name, :id, :version], {:name=>"idx_packages_on_project_id_name_id_version_when_installable_npm", :where=>"package_type = 2 AND status IN (0, 1)", :algorithm=>:concurrently})
main:    -> 0.0026s
main: -- execute("RESET statement_timeout")
main:    -> 0.0020s
main: == 20230131125844 AddProjectIdNameIdVersionIndexToInstallableNpmPackages: migrated (0.6807s)

Migration down

$ rake db:rollback:main
main: == 20230131125844 AddProjectIdNameIdVersionIndexToInstallableNpmPackages: reverting 
main: -- transaction_open?()
main:    -> 0.0002s
main: -- view_exists?(:postgres_partitions)
main:    -> 0.0854s
main: -- indexes(:packages_packages)
main:    -> 0.0108s
main: -- execute("SET statement_timeout TO 0")
main:    -> 0.0003s
main: -- remove_index(:packages_packages, {:algorithm=>:concurrently, :name=>"idx_packages_on_project_id_name_id_version_when_installable_npm"})
main:    -> 0.0019s
main: -- execute("RESET statement_timeout")
main:    -> 0.0003s
main: == 20230131125844 AddProjectIdNameIdVersionIndexToInstallableNpmPackages: reverted (0.1091s) 

🔬 Database analysis

The query is triggered by Packages::Npm::PackageFinder here: https://gitlab.com/gitlab-org/gitlab/-/blob/master/app/finders/packages/npm/package_finder.rb#L24

🔥 Current query

Query
SELECT "packages_packages". *
FROM
    "packages_packages"
WHERE
    "packages_packages". "project_id" = XXXX
    AND "packages_packages". "package_type" = 2
    AND "packages_packages". "name" = XXXX
    AND "packages_packages". "status" IN (0, 1)
    AND "packages_packages". "id" IN (
        SELECT
            MAX(id) AS id
        FROM
            "packages_packages"
        WHERE
            "packages_packages". "project_id" = XXXX
            AND "packages_packages". "package_type" = 2
            AND "packages_packages". "name" = XXXX
            AND "packages_packages". "status" IN (0, 1)
        GROUP BY
            "packages_packages". "version")
Explain plan
 Nested Loop Semi Join  (cost=1.11..7.20 rows=1 width=93) (actual time=26.752..12841.557 rows=2557 loops=1)
   Buffers: shared hit=3314775 read=2495 dirtied=274
   I/O Timings: read=6107.472 write=0.000
   ->  Index Scan using idx_packages_packages_on_project_id_name_version_package_type on public.packages_packages  (cost=0.56..3.58 rows=1 width=93) (actual time=24.588..36.432 rows=2557 loops=1)
         Index Cond: ((packages_packages.project_id = XXXX) AND ((packages_packages.name)::text = XXXX::text) AND (packages_packages.package_type = 2))
         Filter: (packages_packages.status = ANY ('{0,1}'::integer[]))
         Rows Removed by Filter: 0
         Buffers: shared hit=2579 read=6
         I/O Timings: read=24.500 write=0.000
   ->  Aggregate  (cost=0.56..3.60 rows=1 width=24) (actual time=0.057..4.851 rows=1279 loops=2557)
         Group Key: packages_packages_1.version
         Buffers: shared hit=3312196 read=2489 dirtied=274
         I/O Timings: read=6082.972 write=0.000
         ->  Index Scan using idx_packages_packages_on_project_id_name_version_package_type on public.packages_packages packages_packages_1  (cost=0.56..3.58 rows=1 width=24) (actual time=0.051..4.100 rows=1280 loops=2557)
               Index Cond: ((packages_packages_1.project_id = XXXX) AND ((packages_packages_1.name)::text = XXXX::text) AND (packages_packages_1.package_type = 2))
               Filter: (packages_packages_1.status = ANY ('{0,1}'::integer[]))
               Rows Removed by Filter: 0
               Buffers: shared hit=3312196 read=2489 dirtied=274
               I/O Timings: read=6082.972 write=0.000

https://console.postgres.ai/gitlab/gitlab-production-tunnel-pg12/sessions/15009/commands/52196

=> The execution time is quite high.

🚒 New query

pg.ai setup
exec CREATE INDEX idx_packages_on_project_id_name_id_version_when_installable_npm ON packages_packages USING btree (project_id, name, id, version) WHERE package_type = 2 AND status IN (0, 1)
EXEC VACUUM ANALYZE packages_packages
Query
SELECT "packages_packages". *
FROM
    "packages_packages"
WHERE
    "packages_packages". "project_id" = XXXX
    AND "packages_packages". "package_type" = 2
    AND "packages_packages". "name" = XXXX
    AND "packages_packages". "status" IN (0, 1)
    AND "packages_packages". "id" IN (
        SELECT
            MAX(id) AS id
        FROM
            "packages_packages"
        WHERE
            "packages_packages". "project_id" = XXXX
            AND "packages_packages". "package_type" = 2
            AND "packages_packages". "name" = XXXX
            AND "packages_packages". "status" IN (0, 1)
        GROUP BY
            "packages_packages". "version")
Explain plan
 Nested Loop Semi Join  (cost=2.63..5.70 rows=1 width=93) (actual time=30.642..2004.507 rows=2557 loops=1)
   Buffers: shared hit=2953 read=27
   I/O Timings: read=18.307 write=0.000
   ->  Index Scan using idx_packages_on_project_id_name_id_version_when_installable_npm on public.packages_packages  (cost=0.55..3.57 rows=1 width=93) (actual time=1.665..14.264 rows=2557 loops=1)
         Index Cond: ((packages_packages.project_id = XXXX) AND ((packages_packages.name)::text = XXXX::text))
         Buffers: shared hit=2564 read=4
         I/O Timings: read=1.560 write=0.000
   ->  Aggregate  (cost=2.08..2.10 rows=1 width=24) (actual time=0.012..0.661 rows=1279 loops=2557)
         Group Key: packages_packages_1.version
         Buffers: shared hit=389 read=23
         I/O Timings: read=16.747 write=0.000
         ->  Sort  (cost=2.08..2.09 rows=1 width=24) (actual time=0.011..0.088 rows=1280 loops=2557)
               Sort Key: packages_packages_1.version
               Sort Method: quicksort  Memory: 296kB
               Buffers: shared hit=389 read=23
               I/O Timings: read=16.747 write=0.000
               ->  Index Only Scan using idx_packages_on_project_id_name_id_version_when_installable_npm on public.packages_packages packages_packages_1  (cost=0.55..2.07 rows=1 width=24) (actual time=0.058..17.665 rows=2557 loops=1)
                     Index Cond: ((packages_packages_1.project_id = XXXX) AND (packages_packages_1.name = XXXX::text))
                     Heap Fetches: 0
                     Buffers: shared hit=386 read=23
                     I/O Timings: read=16.747 write=0.000

https://console.postgres.ai/gitlab/gitlab-production-tunnel-pg12/sessions/15009/commands/52201

=> The execution time went down since the query used Index Only Scan

MR acceptance checklist

This checklist encourages us to confirm any changes have been analyzed to reduce risks in quality, performance, reliability, security, and maintainability.

Related to #388241 (closed)

Edited by Dzmitry (Dima) Meshcharakou

Merge request reports