Skip to content

Slow searches on large groups

Problem

Searches appear to be quite slow in general https://dashboards.gitlab.net/d/rPsQXrImk/rails-controller?orgId=1&refresh=1m&var-env=gprd&var-type=web&var-stage=main&var-controller=SearchController&var-action=show but from what I can tell they are even slower for large groups.

Screen_Shot_2020-03-30_at_2.18.12_pm

I have a suspicion though haven't verified this is due to how our queries look when sent to Elasticsearch. If you query a large group then we end up sending all of the project IDs in that group as part of the query to be matched against. This likely has a linear scaling characteristic as Elasticsearch is doing an indexed lookup for each ID in the list.

I believe I've been seeing this in some searches already. For example the TTFB on https://gitlab.com/search?group_id=9970&project_id=&repository_ref=&scope=issues&search=hello+world is ~3.9s for me while a comparatively more expensive query returning more results for a group with fewer projects https://gitlab.com/search?group_id=6416655&project_id=&repository_ref=&scope=issues&search=%2A is returning in 1.5s . Obviously both are slow but perhaps we're exceeding a sensible limit in terms of how many project ids we're sending to Elasticsearch in our queries.

Solution

For group level searches we could be using a dedicated indexed field for this. One way is to use the full project path and do a prefix query match on it. This would mean only matching a single prefix when doing a group level search. This could be massively faster than a query matching thousands of IDs.

Assuming this works it would immediately address the group level search issue but global search (across the entire instance) would still be matching multiple things. We could still use group prefixes for this though if we assume that users generally are going to be members of far fewer groups than they are members of projects and thus it will be a significant improvement anyway.

Edited by Dylan Griffith