DB replication lag not decreasing

The PostgreSQL server and Wal-E stopped running for a bit, but after getting them back up we see replication lag hasn't really decreased (https://performance.gprd.gitlab.com/dashboard/db/geo-status?orgId=1&panelId=21&fullscreen&from=now-3h&to=now):

image

I'm wondering:

  • If we need to tune Wal-E somehow
  • If we need to tune other PostgreSQL settings
  • Whether the replication lag metric is accurate (e.g. https://github.com/DataDog/dd-agent/issues/1312)

/cc: @_stark

Assignee Loading
Time tracking Loading