Make GeoLogCursor Highly Available
Description
With the introduction of Geo LogCursor we also introduced a weak link on the replication propagation task, as even in a multi machine setup with HA, there can only be one LogCursor running at the same time.
Proposal
While the restriction is being made using the ExclusiveLease with the introduction of a renew mechanism, this can also be easily adapted to enable dormant processes that would elect a new "active" when the ExclusiveLease expires and take over the task, creating a HA solution.
So with some small changes we can allow multiple instances of the Log Cursor to run (in different machines for example), only one will be the "active" and the others will be idleing until the lease from the active expires, which will then prompt all dormant to try to became the new "active".
When a process can't get the lease, it will ask for the expiration time of current lease and "sleep" for that amount of time + some extra random miliseconds, and will loop again trying to acquire the lease. This are the "dormant" ones. When one of then acquires the lease it becomes the new "active" and it starts processing the data.
Links / references
Feature checklist
Make sure these are completed before closing the issue, with a link to the relevant commit.
-
Feature assurance -
Documentation -
Added to features.yml