Group search with traversal_id is not performant in some cases
Summary
This was discovered while debugging SLI alerts for Advanced search over the past few days
Advanced group search uses traversal_ids in some cases to reduce the number of project_id being sent to Elasticsearch. This is done by querying for documents with these authorization specific filters:
- match the
traversal_idfor the group being searched - do not match the list of projects the user does NOT have access to (
rejected_project_ids)
In some cases the database queries to find the list of rejected projects is taking a long time.
For example:
- namespace has a lot of projects (example:
20_000 - user has access to a small set of projects (example:
100) - the
rejected_project_idswill contain the majority of the projects
Steps to reproduce
too hard to reproduce locally
What is the expected correct behavior?
using traversal_id for group queries should always result in better performance
Relevant logs and/or screenshots
Kibana graph with slow durations
Possible fixes
This is across any scope that uses traversal_ids for group searches
Some ideas from a pair session with Dmitry:
- set a limit of the # of projects that can be rejected, do not use the traversal_id optimization in that case
- do not use traversal_ids if the allowlist of projects is small (thinking < 1000)
- redesign traversal_ids optimization to get traversal_ids for each sub group the user has access to and create an array of filters for each subgroup
traversal_ids/rejected_project_ids
Edited by Terri Chu