Skip to content

Fix PauseControl sidekiq middleware

Dmitry Gruzd requested to merge fix-pause-control-middleware into master

What does this MR do and why?

During the Zoekt incident gitlab-com/gl-infra/production#17266 (closed) we noticed that PauseControl Sidekiq middleware didn't pause all Zoekt indexing operations as we expected. After some debugging I've noticed that the client part of the middeware receives worker_class, but the server one receives worker instance, which caused this bug for the latter.

Because of this mismatch the second part wasn't working properly and was also returning strategy: :none.

The simplest way to test it is

Gitlab::SidekiqMiddleware::PauseControl::WorkersMap.strategy_for(worker: Zoekt::IndexerWorker.new)

On master it's:

[1] pry(main)> Gitlab::SidekiqMiddleware::PauseControl::WorkersMap.strategy_for(worker: Zoekt::IndexerWorker.new)
=> nil
[2] pry(main)> Gitlab::SidekiqMiddleware::PauseControl::WorkersMap.strategy_for(worker: Zoekt::IndexerWorker)
=> :zoekt

And on this branch:

[1] pry(main)> Gitlab::SidekiqMiddleware::PauseControl::WorkersMap.strategy_for(worker: Zoekt::IndexerWorker.new)
=> :zoekt
[2] pry(main)> Gitlab::SidekiqMiddleware::PauseControl::WorkersMap.strategy_for(worker: Zoekt::IndexerWorker)
=> :zoekt

Screenshots or screen recordings

Screenshots are required for UI changes, and strongly recommended for all other merge requests.

Before After

How to set up and validate locally

  1. Set up Zoekt
  2. Check out master
    git checkout master
  3. Restart Sidekiq
    gdk restart rails-background-jobs
  4. Truncate zoekt index
    Gitlab::Search::Zoekt::Client.instance.truncate
  5. Ensure that you don't have files in $GDK_DIR/zoekt-data/development/index
  6. Restart rails console and perform these commands (in that order). This is specifically to schedule the job in the future before we pause indexing
    Zoekt::IndexerWorker.perform_in(65, 7, { "foo" => Time.now.to_i })
    ::Feature.enable(:zoekt_pause_indexing)
  7. Monitor $GDK_DIR/zoekt-data/development/index and tail -f log/sidekiq.log | fgrep Zoekt
  8. In 65 seconds you should see that the job has been executed and index files in $GDK_DIR/zoekt-data/development/index appeared
  9. Checkout this branch fix-pause-control-middleware
  10. Restart Sidekiq
    gdk restart rails-background-jobs
  11. Truncate zoekt index, disable the FF, and clear the ZSET
    ::Feature.disable(:zoekt_pause_indexing)
    PauseControl::ResumeWorker.new.perform
    # Wait a few seconds so that existing jobs complete
    Gitlab::Search::Zoekt::Client.instance.truncate
  12. Ensure that you don't have files in $GDK_DIR/zoekt-data/development/index
  13. Perform these commands (in that order). This is specifically to schedule the job in the future before we pause indexing
    Zoekt::IndexerWorker.perform_in(65, 7, { "foo" => Time.now.to_i })
    ::Feature.enable(:zoekt_pause_indexing)
  14. Monitor $GDK_DIR/zoekt-data/development/index and tail -f log/sidekiq.log | fgrep Zoekt
  15. Wait for log records with "job_status":"paused"
  16. In 65 seconds you should not see that the job has been executed and the $GDK_DIR/zoekt-data/development/index directory should be empty
  17. You can also check that this commands returns 1
    Gitlab::SidekiqMiddleware::PauseControl::PauseControlService.queue_size('Zoekt::IndexerWorker')
  18. Do not forget to unpause indexing
    ::Feature.disable(:zoekt_pause_indexing)

MR acceptance checklist

This checklist encourages us to confirm any changes have been analyzed to reduce risks in quality, performance, reliability, security, and maintainability.

Edited by Dmitry Gruzd

Merge request reports