Advanced Search: Explore Siphon for tracking updates

Everyone can contribute. Help move this issue forward while earning points, leveling up and collecting rewards.

Background

Currently, Advanced Search relies on callbacks to trigger updates for indexed data. However, this approach has limitations as callbacks aren't consistently used across all update scenarios (e.g., DB migrations, update_all method calls).

Current Challenges

  1. Current callback-based system doesn't capture all data changes

    • Project imports don't respect callbacks
    • DB migrations bypass callbacks
    • update_all method calls aren't tracked
  2. Alternative approaches considered:

    • Updated_at based tracking was evaluated but ruled out due to:

      • Prevents HOT updates in PostgreSQL
      • Potential consistency issues due to replication lag
      • Would require querying primary DB (can't offload to replicas)
      • Updated_at index isn't always present
    • LISTEN/NOTIFY was considered but may face single-threaded performance limitations on GitLab.com

Proposed Solution

Investigate using Siphon for tracking database updates:

Benefits

  • No additional load on PostgreSQL primary
  • More reliable change detection compared to callbacks
  • Could potentially move more searches off PostgreSQL
  • Flexible event subscription system that other teams could utilize

Implementation Considerations

  • We only need to receive events, not perform Elasticsearch/OpenSearch updates directly from Siphon
  • Can ignore payload data and just track INSERT/DELETE/UPDATE events
  • Example consumer for ClickHouse integration available as reference

Open Questions

  • Timeline for Siphon availability in production
  • Performance testing needed to ensure adequate processing speed
  • WAL file accumulation handling when Siphon is down or processing slowly
  • Architecture for making the event system flexible for other teams' use

Next Steps

  1. Explore PoC for writing data from Siphon to our data store
  2. Investigate performance implications for GitLab.com
  3. Design flexible event subscription system
Edited by 🤖 GitLab Bot 🤖