Unclear sudden growth of Postgres

Summary

Timeline

All times UTC.

2020-05-13

10:45 - Database begins to grow faster than usual & errors are rising
10:55 - Database growth back to normal, errors peaked, and slowly falling
11:00 - Backup delay alerted
11:18 - Backup delay remedied itself
11:20 - Errors peaked another time, ~2x as before
11:28 - Errors back down
12:03 - Incident declared from Slack

Details

We saw a sudden increase of used disk space on our postgres DB accompanied by a tral of errors an high load. This increase has caused our backup alert to fire. The backup was still in progress, but took too long to finish triggering the alert.

No further slowness was observed. Marking as S3 for now

Source

Incident declared by t4cc0re in Slack via /incident declare command.

Resources

If the Situation Zoom room was utilised, recording will be automatically uploaded to Incident room Google Drive folder (private)

Edited May 12, 2020 by Hendrik Meyer (xLabber)