Stop searching for project descriptions and namespaces when searching for projects
https://gitlab.com/gitlab-org/gitlab-ce/issues/40510 describes a query that is extremely slow. Part of this is because we JOIN namespaces
onto the list of projects, then search through the project and namespace descriptions (on top of the names). This means we perform a wildcard search on:
projects.path
projects.name
projects.description
namespaces.name
This is simply too much and we need to reduce the amount of data we search through. If we want to support more we need to offload that to ElasticSearch.
While in the described query we can solve most of the problem by not joining namespaces, this pattern is a recurring pattern. For example, when searching issues we search for multiple fields as well.
To ensure good search performance we need to make the following changes:
- When searching for projects we only search for matches in
projects.name
andprojects.path
. Searching in additional columns and/or tables should be forbidden because the dataset is simply too large. - When searching for issues, merge requests, etc, the same rules apply except you can also search the description (since this is useful here). Joining additional tables to search is not allowed
The first step for this process is finding out where we allow searching of projects, then reduce that set to cases where we search for more than just the path and name. Once we have that list we can start adjusting the queries.
I'm tagging this as AP1 since the performance improvements for searching can be massive (we're talking queries going from 60 seconds to a few milliseconds).