Optimize EnvironmentsByDeploymentFinder (!72781) · Merge requests · GitLab.org / GitLab

Vladimir Shushlin requested to merge 342715-db-timeout-when-loading-environment-in-projects-blobcontroller-show-2 into master Oct 21, 2021

What does this MR do and why?

Projects::BlobController#show renders this:

It runs the following query to find out which environment the particular commit is deployed to:

SELECT "environments".*
FROM "environments"
WHERE "environments"."project_id" = $1
  AND ("environments"."state" IN ($2))
  AND "environments"."id" IN (
    SELECT "deployments"."environment_id"
    FROM "deployments"
    WHERE "deployments"."project_id" = $3 
      AND (ref = $4)
    GROUP BY "deployments"."environment_id"
  )
ORDER BY (
  SELECT MAX("deployments"."id")
  FROM "deployments"
  WHERE "deployments"."environment_id" = "environments"."id"
) DESC NULLS LAST

On a project with many deployments, this query can take 35 seconds to run: https://console.postgres.ai/gitlab/gitlab-production-tunnel-pg12/sessions/6885/commands/24330.

The slowest part of this query is:

                     ->  Index Scan using index_deployments_on_project_id_and_ref on public.deployments  (cost=0.57..5544.56 rows=3836 width=4) (actual time=7.002..35026.091 rows=221606 loops=1)
                           Index Cond: ((deployments.project_id = 15426071) AND ((deployments.ref)::text = 'main'::text))
                           Buffers: shared hit=49 read=72478 dirtied=1747
                           I/O Timings: read=34516.969 write=0.000

My original attempt was to add a new index: !72023 (closed), but request still was very slow.

@tigerwnz had another suggestion

The index will definitely speed this up, but I'm not sure we can solve this with indexing alone. We do get a benefit by enabling an index only scan, but we still have to look at every matching deployment row (over 200,000 in this case), which means we're only solving the problem temporarily - projects with lots of deployments will continue to get slower as we have to search through more records.

We should be able to help this to some degree by modifying the query itself. For example, if we change this IN subquery:
  AND "environments"."id" IN (
    SELECT "deployments"."environment_id"
    FROM "deployments"
    WHERE "deployments"."project_id" = $3 
      AND (ref = $4)
    GROUP BY "deployments"."environment_id"
  )
To an EXISTS:
  AND EXISTS (
    SELECT 1
    FROM "deployments"
    WHERE "deployments"."environment_id" = "environments"."id"
      AND (ref = $4)
  )
Or, in ruby in the finder:
      environments = project.environments.available
        .where('EXISTS (?)', deployments.where('environment_id = environments.id'))
The best I found while experimenting was https://console.postgres.ai/gitlab/gitlab-production-tunnel-pg12/sessions/6937/commands/24580, which was the EXISTS query combined with an index on (environment_id, ref). This brought the (cached) query time down to about 20ms with 123.50MiB data read.

This MR just implements this suggestion: we add a new index on (environment_id, ref) to deployments and use exists query.

I think we should later go with archiving deployments option.

Database Migrations

vlad @ gdk1 ➜  gitlab git:(342715-db-timeout-when-loading-environment-in-projects-blobcontroller-show-2) ✗  ./bin/rails db:migrate                                              
== 20211021131217 AddDeploymentsEnvironmentIdAndRefIndex: migrating ===========
-- transaction_open?()
   -> 0.0000s
-- index_exists?(:deployments, [:environment_id, :ref], {:name=>"index_deployments_on_environment_id_and_ref", :algorithm=>:concurrently})
   -> 0.0213s
-- execute("SET statement_timeout TO 0")
   -> 0.0015s
-- add_index(:deployments, [:environment_id, :ref], {:name=>"index_deployments_on_environment_id_and_ref", :algorithm=>:concurrently})
   -> 0.0208s
-- execute("RESET statement_timeout")
   -> 0.0026s
== 20211021131217 AddDeploymentsEnvironmentIdAndRefIndex: migrated (0.0654s) ==

vlad @ gdk1 ➜  gitlab git:(342715-db-timeout-when-loading-environment-in-projects-blobcontroller-show-2) ✗  ./bin/rails db:rollback
== 20211021131217 AddDeploymentsEnvironmentIdAndRefIndex: reverting ===========
-- transaction_open?()
   -> 0.0000s
-- indexes(:deployments)
   -> 0.0221s
-- execute("SET statement_timeout TO 0")
   -> 0.0012s
-- remove_index(:deployments, {:algorithm=>:concurrently, :name=>"index_deployments_on_environment_id_and_ref"})
   -> 0.0048s
-- execute("RESET statement_timeout")
   -> 0.0009s
== 20211021131217 AddDeploymentsEnvironmentIdAndRefIndex: reverted (0.0331s) ==

MR acceptance checklist

This checklist encourages us to confirm any changes have been analyzed to reduce risks in quality, performance, reliability, security, and maintainability.

I have evaluated the MR acceptance checklist for this MR.

Related to #342715 (closed)

Edited Oct 22, 2021 by Vladimir Shushlin

Optimize EnvironmentsByDeploymentFinder

What does this MR do and why?

Database Migrations

MR acceptance checklist

Merge request reports