Skip to content

WIP: Snippet count search performance improvements

What does this MR do?

Improves performance of snippets search by modifying the code surrounding the count query behavior.

Original Count Query
SELECT COUNT(*)
FROM
  (SELECT "snippets".*
  FROM "snippets"
  WHERE (snippets.visibility_level IN (10,
                                        20)
          OR snippets.author_id = 1675774)
    AND "snippets"."project_id" IS NULL
  UNION SELECT "snippets".*
  FROM "snippets"
  INNER JOIN "projects" ON "projects"."id" = "snippets"."project_id"
  INNER JOIN "project_features" ON "project_features"."project_id" = "projects"."id"
  WHERE (snippets.visibility_level IN (10,
                                        20)
          OR snippets.author_id = 1675774)
    AND (projects.visibility_level IN (10,
                                        20))
    AND "project_features"."snippets_access_level" IN (20,
                                                        30)
  UNION SELECT "snippets".*
  FROM "snippets"
  INNER JOIN "projects" ON "projects"."id" = "snippets"."project_id"
  INNER JOIN "project_features" ON "project_features"."project_id" = "projects"."id"
  WHERE "project_features"."snippets_access_level" IN (20,
                                                        30,
                                                        10)
    AND (EXISTS
            (SELECT 1
            FROM "project_authorizations"
            WHERE (project_id = snippets.project_id)
              AND "project_authorizations"."user_id" = 1675774))) snippets
WHERE "snippets"."content" ILIKE '%foo%'

Code improvements

Limiting the count

At the moment, the count query doesn't limit the returned results as we do with other resources. In the new plan we can see that, once we add the limit, compared to the original one the improvement has been 7s; from 37s down to 30s (but still too long).

Avoid reordering twice

By default, the snippets returned are ordered by created_at but, the search returns them sorted by updated_at. Nevertheless, the setting of that order is not done properly which means that we sort for both fields, while we only need the last one (in order to get the more frequent or actual ones). The plans are https://explain.depesz.com/s/WWjx vs https://explain.depesz.com/s/BLCv. The improvement is 1s so it's not very representative but at least means we're making improvements.

Split the count query

A regular snippet search will involve two operations in the same request: one count and the fetching of the records. The count is approximately 30s (with the other changes) and the fetch 52s (82s in total). The count operation basically counts public and internal snippets (in both global and in projects scope with open access) and finally public, internal, and private snippets in private projects.

The global scope (personal snippets) is maybe the one with more records (in GitLab.com we have 44827 snippets aprox.) and the conditions aren't complex. Therefore, we can use the same approach we took with issues. Instead of counting using the complex query, we can use a more simple one (with only the global snippets) and only we get fewer records than the limit we have for search results (101) we perform the complex query.

The query plan with only the count for global snippets is https://explain.depesz.com/s/d7NL, whose execution time is 1.5s. That means, in the best-case scenario where we have enough global snippets the time will be 1.5s + 52s (53.5s). With this improvement, we go from 84s to 53.5s.

However, in the worst-case scenario, the time will be 1.5s(simple count) + 30s(complex count) + 52s(record fetch) = 83.5s (not too bad compared to the benefit we can get in the other scenario).

Refs https://gitlab.com/gitlab-org/gitlab-ee/issues/26123

Does this MR meet the acceptance criteria?

Conformity

Edited by 🤖 GitLab Bot 🤖

Merge request reports