Advanced search global search for projects in cells architecture

Background

Global search has indexed all project data. The was planned to explore replacing Projects API search with Advanced search when available to improve reliability, performance, and functionality of searching with the Projects API.

Currently the Projects API supports search with different sort options. The frontend uses the Projects API to do project search and uses one of these sorts:

  • By similarity
    • results are relevant
    • All groups/projects the user is authorized for
    • Does not include public projects
  • By last_activity_at, then id
    • results may not be relevant and busy projects float to the top
    • All groups and projects the user is authorized for
    • Includes public projects
  • By created_at, then id (default)
    • results are not relevant unless returned set is small
    • All groups and projects the user is authorized for
    • Includes public projects

This creates one of two experiences (with potentially slow performance on large instances):

  • Project search returns relevant results but they cannot find public projects
  • Project search returns public projects but the results are often not relevant (example: search for gitlab-org and getting back all public forks created recently at the top)

With the introduction of Cells, global search will no longer be instance wide for public projects. Each cell will have a local Elasticsearch cluster which houses the data from all organizations stored on that cell (regardless of project/group visibility). For cells 1.0, that will not be an issue because users will be in private organizations in the cells.

For cells 1.5, this poses a few issues:

  1. Loss of functionality for folks who want to search for public projects
  2. Difficulty in finding projects at any visibility level when users belong to multiple organizations (if they are housed on different cells)

Proposal

Introduce a method to search for public projects globally in a cells architecture.

Searching for projects should default to organization level.

Search should offer a "search globally for public projects" option.

Questions

  1. Do users expect to be able to find all of their projects in one search regardless of organization?
  2. How can public projects be searchable regardless of cell location?

Solutions to explore

Idea Pros Cons
Split public projects into a separate cluster/index
  • Not affected by cell moves
  • Searching must change to use multiple indices.
  • Must keep project visibility in sync if they are stored in different Elasticsearch clusters depending on visibility setting.
  • Projects may become briefly unsearchable when visibility changes occur
Search cross ES cluster
  • No additional work if visibility changes
  • Scoring will need to be refactored to take into account differences when scores are from clusters with different data sets/sizes
Funnel public projects into each cell local Elasticsearch cluster
  • Searches stay in the same cluster
  • Higher storage costs