Geo: Incorporate event processing lag in all replication lag measurements that users see

Problem to solve

We tell users about replication lag:

  • in the banner on the secondary UI.
  • in Git push/pull messages.
  • for each secondary node in Admin Area > Geo > Nodes.
  • in rake geo:status.

For the last three items above, the reported number is only database replication lag. This is inaccurate by itself.

On secondaries, the log cursor processes events that happen on the primary. This can have its own lag.

Intended users

Further details

Proposal

  • Extract event processing lag logic from ApplicationHelper
  • Add event processing lag to GeoNodeStatus <= this is where most of the work is since we have to modify the schema
  • Incorporate event processing lag wherever DB replication lag alone is reported

Permissions and Security

Documentation

Testing

What does success look like, and how can we measure that?

What is the type of buyer?

Links / references

Edited Sep 05, 2019 by Michael Kozono
Assignee Loading
Time tracking Loading