Skip to content

Rewrite SnippetsFinder to improve performance

Yorick Peterse requested to merge refactor-snippets-finder into master

What does this MR do?

Summary: this MR rewrites SnippetsFinder from the ground up, improving its performance by a factor of about 1500 (in the best case scenario).

This completely rewrites the SnippetsFinder class from the ground up in order to improve its performance. The old code was beyond salvaging. It was complex, included various Rails 5 workarounds, comments that shouldn't be necessary, and most important of all: it produced a really poorly performing database query.

As a result, I opted for rewriting the finder from scratch, instead of trying to patch the existing code. Instead of trying to reuse as many existing methods as possible, I opted for defining new methods specifically meant for the SnippetsFinder. This requires some extra code here and there, but allows us to have much more control over the resulting SQL queries. It is these changes that then allow us to produce a much more efficient query.

To illustrate how bad the old query was, we will use my own snippets as an example. Currently I have 52 snippets, most of which are global ones. To retrieve these, you would run the following Ruby code:

user = User.find_by(username: 'yorickpeterse')

SnippetsFinder.new(user, author: user).execute

On GitLab.com the resulting query will take between 10 and 15 seconds to run, producing the query plan found at https://explain.depesz.com/s/Y5IX. Apart from the long execution time, the total number of buffers (the sum of all shared hits) is around 185 GB, though the real number is probably (hopefully) much lower as I doubt simply summing these numbers produces the true total number of buffers used.

The new query's plan can be found at https://explain.depesz.com/s/wHdN, and this query takes between 10 and 100-ish milliseconds to run. The total number of buffers used is only about 30 MB.

EE MR: https://gitlab.com/gitlab-org/gitlab-ee/merge_requests/8105

Query differences

Old SQL query for getting my snippets (ID 209240):

SELECT snippets.*
FROM snippets
WHERE snippets.project_id IN (
    SELECT id
    FROM
    (
        SELECT projects.*
        FROM projects
        LEFT JOIN project_features ON projects.id = project_features.project_id
        WHERE EXISTS (
            SELECT 1
            FROM project_authorizations
            WHERE project_authorizations.user_id = 209240
            AND project_authorizations.project_id = projects.id
        )
        AND (
            project_features.snippets_access_level IN (NULL, 20, 30)

            OR (
                project_features.snippets_access_level = 10

                AND EXISTS (
                    SELECT 1
                    FROM project_authorizations
                    WHERE project_authorizations.user_id = 209240
                    AND project_authorizations.project_id = projects.id
                )
            )
        )

        UNION

        SELECT projects.*
        FROM projects
        LEFT JOIN project_features ON projects.id = project_features.project_id
        WHERE projects.visibility_level IN (10, 20)
        AND (
            project_features.snippets_access_level IN (NULL, 20, 30)

            OR (
                project_features.snippets_access_level = 10

                AND EXISTS
                (
                    SELECT 1
                    FROM project_authorizations
                    WHERE project_authorizations.user_id = 209240
                    AND project_authorizations.project_id = projects.id
                )
            )
        )
    ) projects
)
OR snippets.project_id IS NULL
AND (
    EXISTS (
        SELECT 1
        FROM project_authorizations
        WHERE project_authorizations.user_id = 209240
        AND project_authorizations.project_id = snippets.project_id
    )
    OR snippets.visibility_level IN (10, 20)
    OR snippets.author_id = (209240)
)
AND snippets.author_id = 209240
ORDER BY created_at DESC

And the new SQL query:

SELECT snippets.*
FROM (
    SELECT snippets.*
    FROM snippets
    WHERE snippets.author_id = 209240
    AND snippets.project_id IS NULL

    UNION

    SELECT snippets.*
    FROM snippets
    INNER JOIN projects ON projects.id = snippets.project_id
    INNER JOIN project_features ON project_features.project_id = projects.id
    WHERE snippets.author_id = 209240
    AND projects.visibility_level IN ( 10,20)
    AND project_features.snippets_access_level IN (NULL,20,30)

    UNION

    SELECT snippets.*
    FROM snippets
    INNER JOIN projects ON projects.id = snippets.project_id
    INNER JOIN project_features ON project_features.project_id = projects.id
    WHERE snippets.author_id = 209240
    AND project_features.snippets_access_level IN (NULL,20,30,10)
    AND EXISTS (
        SELECT 1
        FROM project_authorizations
        WHERE project_id = snippets.project_id
        AND project_authorizations.user_id = 209240
    )
)
snippets
ORDER BY created_at DESC

Migration output

From staging:

[ gstg ] production> MigrateSnippetsAccessLevelDefaultValue.new.migrate(:up)
==  MigrateSnippetsAccessLevelDefaultValue: migrating =========================
-- change_column_default(:project_features, :snippets_access_level, 20)
   -> 0.0875s
-- change_column_null(:project_features, :snippets_access_level, false)
   -> 0.3264s
==  MigrateSnippetsAccessLevelDefaultValue: migrated (6.9050s) ================

=> nil
[ gstg ] production> MigrateSnippetsAccessLevelDefaultValue.new.migrate(:down)
==  MigrateSnippetsAccessLevelDefaultValue: reverting =========================
-- change_column_null(:project_features, :snippets_access_level, true)
   -> 1.4426s
-- change_column_default(:project_features, :snippets_access_level, nil)
   -> 0.0153s
==  MigrateSnippetsAccessLevelDefaultValue: reverted (1.4582s) ================

What are the relevant issue numbers?

https://gitlab.com/gitlab-org/gitlab-ce/issues/52639

Does this MR meet the acceptance criteria?

Edited by Yorick Peterse

Merge request reports