Geo: Incorporate event processing lag in all replication lag measurements that users see
Problem to solve
We tell users about replication lag:
- in the banner on the secondary UI.
- in Git push/pull messages.
- for each secondary node in Admin Area > Geo > Nodes.
- in
rake geo:status.
For the last three items above, the reported number is only database replication lag. This is inaccurate by itself.
On secondaries, the log cursor processes events that happen on the primary. This can have its own lag.
Intended users
Further details
Proposal
-
Extract event processing lag logic from ApplicationHelper -
Add event processing lag to GeoNodeStatus<= this is where most of the work is since we have to modify the schema -
Incorporate event processing lag wherever DB replication lag alone is reported
Permissions and Security
Documentation
Testing
What does success look like, and how can we measure that?
What is the type of buyer?
Links / references
Edited by Michael Kozono