Evaluate hybrid Sidekiq + Solid Queue approach for critical job reliability

Summary

Instead of adding sidekiq-reliable-fetch to handle job loss during deployments, evaluate running Solid Queue alongside Sidekiq for critical jobs only.

Problem

Currently addressing job loss during deployments with !13833 by adding the sidekiq-reliable-fetch gem. This requires:

  • Vendored gem (Sidekiq 6.x compatibility)
  • Background cleanup processes
  • Working queue management
  • Manual recovery procedures

Proposed Solution

Run Solid Queue and Sidekiq in parallel:

  1. Add Solid Queue gem to the project
  2. Identify critical jobs that need reliability (e.g., ZuoraCallbackJob, UpdateGitlabPlanInfoJob)
  3. Move only those jobs to Solid Queue queues
  4. Keep remaining jobs on Sidekiq
  5. No migration of existing job data needed

Advantages

  • Lower risk: Only critical jobs use new system
  • Gradual migration path: Move more jobs over time if successful
  • No job data migration required
  • Easy rollback if issues arise
  • Simpler than full Solid Queue migration

Disadvantages

  • Operational complexity: Managing two job systems
  • Still requires Redis for Sidekiq
  • Temporary state: Eventually need to fully migrate or commit to reliable-fetch approach

Next Steps

  1. Evaluate Solid Queue's reliability guarantees vs sidekiq-reliable-fetch
  2. Identify which jobs are critical and need reliability
  3. Prototype running both systems in parallel
  4. Compare operational overhead and reliability outcomes
  5. Decide on long-term strategy (full Solid Queue migration vs reliable-fetch)
Assignee Loading
Time tracking Loading