feat(bbm): add bbm first iteration
Related to #1265 (closed)
NOTE: This feature is disabled both by a feature flag and and a config parameter.
What?
The code changes implement the first pass on background migration mechanism in the registry. A background migration entry results in migration jobs that need to be executed by a worker. The worker periodically checks for available jobs and executes them, using a distributed (postgres advisory) lock to ensure that only one worker is executing a job at a time.
The migration jobs are created based on the contents of the batched_background_migrations table and are executed according to constraints in the configuration (e.g If a job fails, it is retried up to a maximum number of attempts before the background migration is marked as failed). Some more details in the How section below:
How
-
Background Migration Worker Setup: Introduced a new worker (
JobWorker) to manage background migration tasks. It handles job scheduling with customizable intervals and retry attempts for robust execution. -
Job Handling Enhancements: Defined clear status constants (
Active,Running,Failed,Finished,Paused) and error definitions (ErrlockInUse,ErrJobEndPointNotFound) to streamline job management and error handling. -
Work Representation and Registration: Implemented a structured approach (
Workstruct) to represent and register various background migration tasks. This ensures each task is uniquely identified and properly initialized. -
Execution Pattern and Monitoring:
-
Execution Flow:
Implemented a periodic job scanning mechanism (
ListenForWork) that detects and executes pending background migration tasks. -
Job Search Pattern:
-
Initial Job Detection:
Implemented functions (
FindWork) to detect pending jobs and manage their lifecycle, including creation, retry, and status updates. -
Job Creation and Retry:
Efficiently manages job creation and retry mechanisms based on database state and configured retry limits (
maxJobAttempt). - Error Handling: Handles potential errors like lock contention and job endpoint calculation failures with comprehensive logging and error propagation strategies.
-
Initial Job Detection:
Implemented functions (
-
Execution Flow:
Implemented a periodic job scanning mechanism (
Next steps
- Implement
BackgroundMigrationStoredatabase queries - Add Tests for
BackgroundMigrationStoreand integration tests for the entire Background migration process - Surface critical errors to sentry
Related to #1265 (closed)
Author checklist
-
Feature flags
-
Added feature flag: -
This feature does not require a feature flag
-
-
I added unit tests or they are not required -
I added documentation (or it's not required) -
I followed code review guidelines -
I followed Go Style guidelines -
For database changes including schema migrations: -
Manually run up and down migrations in a postgres.ai production database clone and post a screenshot of the result here. -
If adding new queries, extract a query plan from postgres.ai and post the link here. If changing existing queries, also extract a query plan for the current version for comparison. -
I do not have access to postgres.ai and have made a comment on this MR asking for these to be run on my behalf.
-
-
Do not include code that depends on the schema migrations in the same commit. Split the MR into two or more.
-
-
Ensured this change is safe to deploy to individual stages in the same environment ( cny->prod). State-related changes can be troublesome due to having parts of the fleet processing (possibly related) requests in different ways.
Reviewer checklist
-
Ensure the commit and MR tittle are still accurate. -
If the change contains a breaking change, apply the breaking change label. -
If the change is considered high risk, apply the label high-risk-change -
Identify if the change can be rolled back safely. (note: all other reasons for not being able to rollback will be sufficiently captured by major version changes).
If the MR introduces database schema migrations:
-
Ensure the commit and MR tittle start with fix:,feat:, orperf:so that the change appears on the Changelog
If the changes cannot be rolled back follow these steps:
-
If not, apply the label cannot-rollback. -
Add a section to the MR description that includes the following details: -
The reasoning behind why a release containing the presented MR can not be rolled back (e.g. schema migrations or changes to the FS structure) -
Detailed steps to revert/disable a feature introduced by the same change where a migration cannot be rolled back. (note: ideally MRs containing schema migrations should not contain feature changes.) -
Ensure this MR does not add code that depends on these changes that cannot be rolled back.
-
Related to #1265 (closed)