Skip to content

feat(bbm): add bbm first iteration

Related to #1265 (closed)

NOTE: This feature is disabled both by a feature flag and and a config parameter.

What?

The code changes implement the first pass on background migration mechanism in the registry. A background migration entry results in migration jobs that need to be executed by a worker. The worker periodically checks for available jobs and executes them, using a distributed (postgres advisory) lock to ensure that only one worker is executing a job at a time.

The migration jobs are created based on the contents of the batched_background_migrations table and are executed according to constraints in the configuration (e.g If a job fails, it is retried up to a maximum number of attempts before the background migration is marked as failed). Some more details in the How section below:

How

  • Background Migration Worker Setup: Introduced a new worker (JobWorker) to manage background migration tasks. It handles job scheduling with customizable intervals and retry attempts for robust execution.

  • Job Handling Enhancements: Defined clear status constants (Active, Running, Failed, Finished, Paused) and error definitions (ErrlockInUse, ErrJobEndPointNotFound) to streamline job management and error handling.

  • Work Representation and Registration: Implemented a structured approach (Work struct) to represent and register various background migration tasks. This ensures each task is uniquely identified and properly initialized.

  • Execution Pattern and Monitoring:

    • Execution Flow: Implemented a periodic job scanning mechanism (ListenForWork) that detects and executes pending background migration tasks.
    • Job Search Pattern:
      • Initial Job Detection: Implemented functions (FindWork) to detect pending jobs and manage their lifecycle, including creation, retry, and status updates.
      • Job Creation and Retry: Efficiently manages job creation and retry mechanisms based on database state and configured retry limits (maxJobAttempt).
      • Error Handling: Handles potential errors like lock contention and job endpoint calculation failures with comprehensive logging and error propagation strategies.

Next steps

  • Implement BackgroundMigrationStore database queries
  • Add Tests for BackgroundMigrationStore and integration tests for the entire Background migration process
  • Surface critical errors to sentry

Related to #1265 (closed)

Author checklist

  • Feature flags
    • Added feature flag:
    • This feature does not require a feature flag
  • I added unit tests or they are not required
  • I added documentation (or it's not required)
  • I followed code review guidelines
  • I followed Go Style guidelines
  • For database changes including schema migrations:
    • Manually run up and down migrations in a postgres.ai production database clone and post a screenshot of the result here.
    • If adding new queries, extract a query plan from postgres.ai and post the link here. If changing existing queries, also extract a query plan for the current version for comparison.
      • I do not have access to postgres.ai and have made a comment on this MR asking for these to be run on my behalf.
    • Do not include code that depends on the schema migrations in the same commit. Split the MR into two or more.
  • Ensured this change is safe to deploy to individual stages in the same environment (cny -> prod). State-related changes can be troublesome due to having parts of the fleet processing (possibly related) requests in different ways.

Reviewer checklist

  • Ensure the commit and MR tittle are still accurate.
  • If the change contains a breaking change, apply the breaking change label.
  • If the change is considered high risk, apply the label high-risk-change
  • Identify if the change can be rolled back safely. (note: all other reasons for not being able to rollback will be sufficiently captured by major version changes).

If the MR introduces database schema migrations:

  • Ensure the commit and MR tittle start with fix:, feat:, or perf: so that the change appears on the Changelog
If the changes cannot be rolled back follow these steps:
  • If not, apply the label cannot-rollback.
  • Add a section to the MR description that includes the following details:
    • The reasoning behind why a release containing the presented MR can not be rolled back (e.g. schema migrations or changes to the FS structure)
    • Detailed steps to revert/disable a feature introduced by the same change where a migration cannot be rolled back. (note: ideally MRs containing schema migrations should not contain feature changes.)
    • Ensure this MR does not add code that depends on these changes that cannot be rolled back.

Related to #1265 (closed)

Edited by Hayley Swimelar

Merge request reports

Loading