Downtime due to git NFS server going down

On June the 15th we had an outage caused by nfs-file-02 going dark for 17 minutes.

The resolution of the event was automatic as the server came back up, so this issue is only created to record that this event happened.

nfs-file-02

Screen_Shot_2017-06-28_at_2.33.49_PM

Screen_Shot_2017-06-28_at_2.34.01_PM

Screen_Shot_2017-06-28_at_2.34.10_PM

Fleet overview

Screen_Shot_2017-06-28_at_2.39.57_PM

Screen_Shot_2017-06-28_at_2.40.05_PM

Screen_Shot_2017-06-28_at_2.40.14_PM

Screen_Shot_2017-06-28_at_2.40.43_PM

Screen_Shot_2017-06-28_at_2.40.51_PM

Posterior actions:

We were forced to bounce a couple of servers that remained in an incosistent state after the NFS host came back up.

Screen_Shot_2017-06-28_at_2.40.26_PM

Concerns

We didn't got a clear alert of one of the NFS servers going down, so we need to review why didn't we got a page this way.