Skip to content

Fix environment filter of Deployments Finder

Shinya Maeda requested to merge fix-deployment-finder-environment-in-query into master

What does this MR do?

During we were rolling out the feature flag, we noticed that the query was not actually improved as we expected. After we discussed with db folks, it turned out that the culprit was the environment filter in Deployments Finder.

Currently the problematic query looks like this:

explain SELECT
    "deployments".*
FROM
    "deployments"
WHERE
    "deployments"."project_id" = [REDACTED]
    AND "deployments"."updated_at" <= '2021-05-12T07:35:30Z'
    AND "deployments"."environment_id" IN (
        SELECT
            "environments"."id"
        FROM
            "environments"
        WHERE
            "environments"."project_id" = [REDACTED]
            AND "environments"."name" = '[REDACTED]'
    )
ORDER BY
    "deployments"."updated_at" DESC,
    "deployments"."id" DESC
LIMIT
    100 OFFSET 6

Timing

Time: 11.633 s
  - planning: 4.745 ms
  - execution: 11.629 s (estimated* for prod: 0.142...10.585 s)
    - I/O read: 11.538 s
    - I/O write: N/A

Shared buffers:
  - hits: 186 (~1.50 MiB) from the buffer pool
  - reads: 12193 (~95.30 MiB) from the OS file cache, including disk I/O
  - dirtied: 104 (~832.00 KiB)
  - writes: 0

Plan

 Limit  (cost=7.19..7.20 rows=1 width=138) (actual time=11628.386..11628.432 rows=100 loops=1)
   Buffers: shared hit=186 read=12193 dirtied=104
   I/O Timings: read=11538.013
   ->  Sort  (cost=7.19..7.19 rows=1 width=138) (actual time=11628.376..11628.403 rows=106 loops=1)
         Sort Key: deployments.updated_at DESC, deployments.id DESC
         Sort Method: top-N heapsort  Memory: 81kB
         Buffers: shared hit=186 read=12193 dirtied=104
         I/O Timings: read=11538.013
         ->  Nested Loop  (cost=1.12..7.18 rows=1 width=138) (actual time=17.653..11605.541 rows=12502 loops=1)
               Buffers: shared hit=180 read=12193 dirtied=104
               I/O Timings: read=11538.013
               ->  Index Scan using index_environments_on_project_id_and_name on public.environments  (cost=0.56..3.57 rows=1 width=4) (actual time=7.684..7.724 rows=1 loops=1)
                     Index Cond: ((environments.project_id = [REDACTED]) AND ((environments.name)::text = '[REDACTED]'::text))
                     Buffers: shared read=5
                     I/O Timings: read=7.645
               ->  Index Scan using index_deployments_on_project_and_environment_and_updated_at_id on public.deployments  (cost=0.57..3.59 rows=1 width=138) (actual time=9.961..11586.858 rows=12502 loops=1)
                     Index Cond: ((deployments.project_id = [REDACTED]) AND (deployments.environment_id = environments.id) AND (deployments.updated_at <= '2021-05-12 07:35:30'::timestamp without time zone))
                     Buffers: shared hit=180 read=12188 dirtied=104
                     I/O Timings: read=11530.368

The index_deployments_on_project_and_environment_and_updated_at_id index already sorts the rows by updated_at and id, so that there is no necessity to re-sort it again, but PostgreSQL currently does it. The problem is that the "environment_id" IN part makes PostgreSQL think the environment rows could be multiple, however, this will never happen because environment rows are unique on project_id and name (see index_environments_on_project_id_and_name).

In this MR, we'll re-write the query to explicitly tell to PostgreSQL that the number of environment rows is always one. Here is the improved version of query:

explain SELECT
    "deployments".*
FROM
    "deployments"
WHERE
    "deployments"."project_id" = [REDACTED]
    AND "deployments"."updated_at" <= '2021-05-12T07:35:30Z'
    AND (
        deployments.environment_id = (
            SELECT
                "environments"."id"
            FROM
                "environments"
            WHERE
                "environments"."project_id" = [REDACTED]
                AND "environments"."name" = '[REDACTED]'
            LIMIT
                1
        )
    )
ORDER BY
    "deployments"."updated_at" DESC,
    "deployments"."id" DESC
LIMIT
    100 OFFSET 6

Timing

Time: 1.073 ms
  - planning: 0.529 ms
  - execution: 0.544 ms
    - I/O read: N/A
    - I/O write: N/A

Shared buffers:
  - hits: 114 (~912.00 KiB) from the buffer pool
  - reads: 0 from the OS file cache, including disk I/O
  - dirtied: 0
  - writes: 0

Plan

 Limit  (cost=7.17..10.19 rows=1 width=138) (actual time=0.122..0.477 rows=100 loops=1)
   Buffers: shared hit=114
   InitPlan 1 (returns $0)
     ->  Limit  (cost=0.56..3.57 rows=1 width=4) (actual time=0.037..0.038 rows=1 loops=1)
           Buffers: shared hit=5
           ->  Index Scan using index_environments_on_project_id_and_name on public.environments  (cost=0.56..3.57 rows=1 width=4) (actual time=0.037..0.037 rows=1 loops=1)
                 Index Cond: ((environments.project_id = [REDACTED]) AND ((environments.name)::text = '[REDACTED]'::text))
                 Buffers: shared hit=5
   ->  Index Scan using index_deployments_on_project_and_environment_and_updated_at_id_ on public.deployments  (cost=0.57..3.59 rows=1 width=138) (actual time=0.091..0.461 rows=106 loops=1)
         Index Cond: ((deployments.project_id = [REDACTED]) AND (deployments.environment_id = $0) AND (deployments.updated_at <= '2021-05-12 07:35:30'::timestamp without time zone))
         Buffers: shared hit=114

Related #325627 (closed) #329286 (closed)

Screenshots (strongly suggested)

Does this MR meet the acceptance criteria?

Conformity

Availability and Testing

Security

Does this MR contain changes to processing or storing of credentials or tokens, authorization and authentication methods or other items described in the security review guidelines? If not, then delete this Security section.

  • Label as security and @ mention @gitlab-com/gl-security/appsec
  • The MR includes necessary changes to maintain consistency between UI, API, email, or other methods
  • Security reports checked/validated by a reviewer from the AppSec team
Edited by Shinya Maeda

Merge request reports