Skip to content

Fix Geo verification state backfill job can exceed batch size

Michael Kozono requested to merge mk/fix-base-batcher into master

What does this MR do and why?

In certain cases, Geo::VerificationStateBackfillWorker can operate on a batch of rows which exceeds the specified batch size. Causing or additionally contributing to memory spikes #429242 (closed).

Fixes #430913 (closed)

The edge case is when there are a lot of unused IDs to skip over.

This MR fixes the underlying code so that it does not return a batch that exceeds the specified batch size.

It also fixes Gitlab::Geo::RegistryConsistencyWorker, but it may be that we didn't notice a performance issue with that worker because it does a bulk insert rather than iterating over each row and instantiating an ActiveRecord object and saving it.

How to set up and validate locally

This is a bug in a background job that runs every minute. It's hard to test as a black box. The following validates the fix in Rails console.

Demonstrate bug on master branch:

  1. Already have GDKs with Geo
  2. Already have 5 or more projects which are checksummed on the primary
  3. Change to the primary GDK's gitlab directory: cd /path/to/primary/gdk/gitlab
  4. Stop GDK gdk stop
  5. Start only the backend services gdk start gitaly postgresql postgresql-geo redis
  6. In Rails console, Geo::ProjectState.count should return at least 5
  7. In Rails console, delete 4 project_states rows with the lowest project_ids. Run: Geo::ProjectState.order(:project_id).limit(4).delete_all
  8. In Rails console, Geo::ProjectState.count should now return 4 fewer than before
  9. In Rails console, run VerificationStateBackfillService once for a batch of 2 projects, starting from ID 1: s = Geo::VerificationStateBackfillService.new(Project, batch_size: 2); b = Gitlab::Geo::BaseBatcher.new(s.replicable_model, s.verification_state_table_class, s.verification_state_model_key, key: s.send(:batcher_key)); b.send(:reset); s.execute
  10. Notice that all 4 project_states rows got inserted, even though batch size is 2

Demonstrate fix on this MR's branch:

  1. Already have GDKs with Geo
  2. Already have 5 or more projects which are checksummed on the primary
  3. Change to the primary GDK's gitlab directory: cd /path/to/primary/gdk/gitlab
  4. Stop GDK gdk stop
  5. Start only the backend services gdk start gitaly postgresql postgresql-geo redis
  6. In Rails console, Geo::ProjectState.count should return at least 5
  7. In Rails console, delete 4 project_states rows with the lowest project_ids. Run: Geo::ProjectState.order(:project_id).limit(4).delete_all
  8. In Rails console, Geo::ProjectState.count should now return 4 fewer than before
  9. In Rails console, run VerificationStateBackfillService once for a batch of 2 projects, starting from ID 1: s = Geo::VerificationStateBackfillService.new(Project, batch_size: 2); s.send(:batcher).send(:reset); s.execute
  10. Notice that 2 project_states rows got inserted, as expected

MR acceptance checklist

This checklist encourages us to confirm any changes have been analyzed to reduce risks in quality, performance, reliability, security, and maintainability.

Edited by Michael Kozono

Merge request reports