Group search with traversal_id is not performant in some cases

Summary

This was discovered while debugging SLI alerts for Advanced search over the past few days

Advanced group search uses traversal_ids in some cases to reduce the number of project_id being sent to Elasticsearch. This is done by querying for documents with these authorization specific filters:

  • match the traversal_id for the group being searched
  • do not match the list of projects the user does NOT have access to (rejected_project_ids)

In some cases the database queries to find the list of rejected projects is taking a long time.

For example:

  1. namespace has a lot of projects (example: 20_000
  2. user has access to a small set of projects (example: 100)
  3. the rejected_project_ids will contain the majority of the projects

Steps to reproduce

too hard to reproduce locally

What is the expected correct behavior?

using traversal_id for group queries should always result in better performance

Relevant logs and/or screenshots

Kibana graph with slow durations

Possible fixes

This is across any scope that uses traversal_ids for group searches

Some ideas from a pair session with Dmitry:

  • set a limit of the # of projects that can be rejected, do not use the traversal_id optimization in that case
  • do not use traversal_ids if the allowlist of projects is small (thinking < 1000)
  • redesign traversal_ids optimization to get traversal_ids for each sub group the user has access to and create an array of filters for each subgroup traversal_ids/rejected_project_ids
Edited by Terri Chu