Improve alerts on nfs servers

During testing of gitaly cgroups (#2511 (closed)) we found few ways where our monitoring can be improved (https://gitlab.com/gitlab-com/infrastructure/issues/2511#note_39614774). This issue tracks those steps:

  • Add "number of process" graph, including running, sleeping and zombies.
  • Define a sane alert threshold for running and zombie processes number.
  • Set memory alerts when gitaly uses 30G or RAM (hard limit with cgroups is 32G now)
  • Disable CPU alerts (or make them less noisy). Cgroup limits now take care of it, system will be always responsive, gitaly can use all the compute power it wants.
  • (possibly) Alert on OOM invocations
  • (possibly) do we need cgroups_* metrics export?

@gl-infra anything else I have missed? /cc @bjk-gitlab @andrewn