Skip to content

Create optimal indexes for created_at order

Andreas Brandl requested to merge ab/projects-api-created-at-indexes into master

What does this MR do?

The context for this is #36390 (comment 250204180).

I'm going to add a few indexes to projects that optimally support the projects API requests. That is mostly driven by the fact that we allow for various order_by and sort options:

order_by: Return projects ordered by id, name, path, created_at, updated_at, or last_activity_at fields. Default is created_at sort: Return projects sorted in asc or desc order. Default is desc

See https://docs.gitlab.com/ee/api/projects.html.

In this MR, we concentrate on the created_at order.

Now, for unauthenticated requests, we always have the visibility_level=20 filter in place to match public projects. We have an existing index on visibility_level, created_at, id.

Let's look at a query for the first page of GET /api/v4/projects?order_by=created_at&sort=desc

SELECT
    "projects".*
FROM
    "projects"
WHERE
    "projects"."visibility_level" = 20
    AND "projects"."pending_delete" = FALSE
ORDER BY
    "projects"."created_at" DESC,
    "projects"."id" DESC
LIMIT 20 OFFSET 0;                                                                                                                                                                                                                                       

This nicely uses the existing index with a backward scan. Note we order by DESC, DESC and the existing index is ordered by ASC, ASC, so this is the exact opposite. This allows for the backwards scan: https://explain.depesz.com/s/760Tc

Let's try this for sort=asc:

SELECT
    "projects".*
FROM
    "projects"
WHERE
    "projects"."visibility_level" = 20
    AND "projects"."pending_delete" = FALSE
ORDER BY
    "projects"."created_at" ASC,
    "projects"."id" DESC
LIMIT 20 OFFSET 0;                                                                                                                                                                                                                                       

Note that now we order by ASC, DESC and that is incompatible with any (forward/backward) scan on the index with order asc, asc. Hence this results in reading all the data from the index and sorting it on the fly: https://explain.depesz.com/s/H1rE

I suspect that this is the primary reason we see these really high response times for some parameter combinations for the project API endpoint.

My aim is to one-by-one create optimal indexes for the sorting options we have. Measure the impact, rinse and repeat.

The alternative option is to adjust the tie breaker in https://gitlab.com/gitlab-org/gitlab/blob/ab/projects-api-created-at-indexes/lib/api/helpers.rb#L317 to match the sort-order of the actual order_by field. In this case, we would only need one index in asc, asc order and we would be able to scan it backwards in desc, desc order.

The issue for the alternative solution is #195881 (closed). I would still like to go forward with this MR, as this is a low hanging fruit anyways. If we decide to change the tie breaker order, we can simply drop one of the indexes created here.

Does this MR meet the acceptance criteria?

Conformity

Edited by 🤖 GitLab Bot 🤖

Merge request reports