Skip to content

Resolve "Fail the execution of a batched background migration when too many jobs fail"

What does this MR do and why?

Pause the execution of a batched background migration when too many jobs fail

  • BatchedMigration model starts to use state_machine to manage states and events.
  • BatchedMigration has a new column (starts_at). When we perform the event execute, the system updates the starts_at with the current time.
  • BatchedMigrationRunner stops the migration when a certain number of failed jobs is achieved (*)

Note: (*)

  • If we re-run the migration, the system does not count the old jobs
  • The system fails the migration when threshold > (failed_jobs / total_jobs). Currently, the threshold is 5%

up:

== 20220321150028 AddStartedAtToBatchedBackgroundMigrationsTable: migrating ===
-- add_column(:batched_background_migrations, :starts_at, :datetime_with_timezone)
   -> 0.0025s
== 20220321150028 AddStartedAtToBatchedBackgroundMigrationsTable: migrated (0.0025s)

down:

== 20220321150028 AddStartedAtToBatchedBackgroundMigrationsTable: reverting ===
-- remove_column(:batched_background_migrations, :starts_at, :datetime_with_timezone)
   -> 0.0103s
== 20220321150028 AddStartedAtToBatchedBackgroundMigrationsTable: reverted (0.0129s)

Screenshots or screen recordings

These are strongly recommended to assist reviewers and reduce the time to merge your change.

How to set up and validate locally

Numbered steps to set up and validate the change are strongly suggested.

MR acceptance checklist

This checklist encourages us to confirm any changes have been analyzed to reduce risks in quality, performance, reliability, security, and maintainability.

Related to #355975 (closed)

Edited by Diogo Frazão

Merge request reports