Document and expand Advanced Finders architecture

Background

The Advanced Finders concept was originally proposed in this design document MR to create a unified interface for accessing data from multiple backends (PostgreSQL, Elasticsearch/OpenSearch, and potentially ClickHouse). While the original MR was closed as the concept has been partially implemented through GLQL and other initiatives, there's a need to formalize this architecture and make it available as a broader engineering tool.

What Advanced Finders Aimed to Accomplish

The original design document proposed:

Core Objectives

Unified Interface: Create a consistent API for accessing data from either PostgreSQL or Advanced Search (Elasticsearch/OpenSearch)
Performance Optimization: Enable filtered searches to leverage Advanced Search when available, improving performance for complex queries
Backend Selection: Intelligent routing between data sources based on:
- Advanced Search availability
- Query complexity
- Parameter support allowlists
- Data freshness and indexing lag
Consistent Results: Return paginated collections with metadata rather than ActiveRecord relations to support multi-backend results

Key Features

Parameter Support Allowlisting: Gradual migration by maintaining lists of supported parameters for each backend
Transparent Backend Selection: Automatic selection with option for explicit backend specification
Unified Pagination: Support for offset-based, keyset, and scroll-based pagination across different backends using opaque page tokens
Permission Safety: Redaction logic as a final safety net to ensure no unauthorized data is returned

Architecture Changes

Replace ActiveRecord relation returns with FinderResult objects containing:
- Collection of model instances
- Pagination metadata
- Backend information (which data source was used)

Support for both automatic and explicit backend selection:

# Automatic selection
result = AdvancedFinder::Issues.new(current_user, params).execute

# Explicit backend
result = AdvancedFinder::Issues.new(
  current_user, 
  params.merge(backend: AdvancedFinder::Backend::AdvancedSearch)
).execute

Current State and Need

As noted in the final comment, Advanced Finders is becoming increasingly important as GitLab expands use of different data stores (PostgreSQL, Elasticsearch, ClickHouse, Knowledge graph, etc.).

Current implementations include:

GLQL (GraphQL Query Language) work items API
Various search improvements leveraging multiple backends

Missing pieces:

Documentation: No formal documentation on how to leverage Advanced Finders
Selection Criteria: No documented criteria for appropriate backend selection logic
Engineering Guidelines: No guidance for engineers on when and how to use this pattern
Self-managed Considerations: How this architecture supports different data architectures for self-managed instances

Proposed Next Steps

Document Current Implementation
- Create developer documentation for existing Advanced Finders patterns
- Document the GLQL implementation as a reference example
- Provide guidelines on backend selection criteria
Expand Architecture Guidelines
- Define when Advanced Finders should be used vs traditional finders
- Document performance considerations and trade-offs
- Create patterns for new data store integrations (ClickHouse, Knowledge graph)
Self-managed Strategy
- Define how Advanced Finders work when certain backends aren't available
- Document graceful degradation patterns
- Consider feature flag strategies for progressive rollout
Engineering Toolset
- Make Advanced Finders a standard part of the engineering toolkit
- Provide templates/generators for new finder implementations
- Create testing patterns for multi-backend scenarios

Questions for Discussion

Should we formalize Advanced Finders as a standard architectural pattern?
What documentation do we need to make this accessible to all engineering teams?
How do we handle backend selection criteria to avoid performance issues?
What's the strategy for self-managed instances with limited data store availability?
How do we integrate this with emerging data stores like ClickHouse and Knowledge graph?