Resolving cross joins on GroupUsersFinder (!134915) · Merge requests · GitLab.org / GitLab

Omar Qunsul requested to merge 420387-fixing-cross-joins-in-autocomplete-finder into master Oct 23, 2023

What does this MR do and why?

Resolving cross joins when Autocomplete::UsersFinder calls Autocomplete::GroupUsersFinder.

This MR only addresses 1 of the 2 scenarios in the issue. There is still another scenario where Autocomplete::GroupUsersFinder is called, that will be addressed in a separate MR. We might create a separate issue for that.

Related issue: #420387 (closed)

IMPORTANT NOTE: This MR only addresses the Autocomplete in the Search bar on the top. For example, on the Group Merge Requests page. For example: https://gitlab.com/groups/gitlab-org/-/merge_requests. It doesn't affect the autocomplete while writing an issue description for example

What's going on in this MR?

As part of the Cells project, we are removing database cross joins between different database tables that should belong to different databases in the future. For this issue, we are removing a relatively complex autocomplete related query that involves both tables users and members. Both of these tables belong to different database schemas, which means they will end up on two different database tables in the future.

The old query used to issue a query that searches for users based on some query, that matches users emails, username or name, and also belong to some group directly, or indirectly. By looking up the relations between users and groups via the members table.

We have looked into different ways into resolving this, but we decided to go for the 4th suggested solution (see the thread), and releasing behind a new feature flag, so that we can measure the impact of the new change.

Answering some anticipated questions from the reviews:

What's the plan to rollout this FF?

We will start by rolling out to a few developers in the grouptenant scale team so that they can test it for a while, then we release to the rest of the team later if no complaints. But this will be monitored closely via Kibana Logs, to see if the endpoints are responding slow when the feature flag is true. This is the rollout issue: #430268 (closed)

What do we proceed if it turns out that the new changes introduce a performance regression?

We will rollout the MR, and look into other ways to solve this cross joins. Please see this thread

What do we proceed if it turns out that the new changes worked out as expected without performance regression? 🤞

We can consider extracting the logic included in the Autocomplete::UsersFinder into a new module, so that this idea of batch reduction can be reused on any other finders, to solve the cross join problem.
We will progress into resolving the other cross join in the addressed issue.
We continue with the rollout of the FF, by removing the old code. Obviously.

How does these new queries solve the problem?

The original idea to solve the problem was to pluck all the user_id' from the Member relation, and inject them into a new query to search for the users on the users table. But doing so might risk sending a very big query to the database, in case the group has a very large number of users. Using unlimited pluck is not advisable to solve these cross join problems. See here https://docs.gitlab.com/ee/development/database/multiple_databases.html#limited-pluck-followed-by-a-find

How this MR solves the problem is by splitting the range of users that belong to a group, directly or indirectly into several batches of size 1000. We read the user_ids from the members table via each_batch, then do n queries on the users table with the old query, where n is number of batches. For each query we only return the first 20 users that match the query. The current query only returns the first 20 users, so we have a guarantee that from each batch, the users that don't make it to the top 20, will be excluded for sure.

We gather the top 20 users from each batch, then do 1 final query on them to sort them using the first original query.

New database queries

Notes

Please don't consider the times of the queries, as much as the plans and the number of Shared Buffers hits/reads instead. As times can be affected by cached pages.

For comparison, this is the old query that used to query users in `Autocomplete::Usersfinder`, which is currently on master.

Existing query on master https://postgres.ai/console/gitlab/gitlab-production-tunnel-pg12/sessions/23764/commands/76254

New queries introduced in this new MR.

Getting user_ids in batches using pluck_user_ids

https://console.postgres.ai/gitlab/gitlab-production-tunnel-pg12/sessions/23764/commands/76251

Getting users that match some query, within a lot of user_ids that are plucked from the previous query:

https://postgres.ai/console/gitlab/gitlab-production-tunnel-pg12/sessions/23764/commands/76252

How to set up and validate locally

Check out this branch 420387-fixing-cross-joins-in-autocomplete-finder
Make sure to enable the feature flag via Rails Console Feature.enable(:group_users_autocomplete_using_batch_reduction)
Create or add a lot of users to a Group. To avoid adding more than 1000 of users, feel free to temporarily reduce the size of newly introduced constant Autocomplete::UsersFinder::BATCH_SIZE to a smaller size while testing.
Go to the merge requests page of the Group.
In the top search bar, filter by the author. Try different users by email, username or nickname, and see that the autocomplete returns the expected results.

MR acceptance checklist

This checklist encourages us to confirm any changes have been analyzed to reduce risks in quality, performance, reliability, security, and maintainability.

I have evaluated the MR acceptance checklist for this MR.

Related to #420387 (closed)

Edited Nov 15, 2023 by Omar Qunsul

Resolving cross joins on GroupUsersFinder