investigate application behaviour during involuntary disruptions (e.g. a pod getting OOM killed)

This issue should be considered together with: https://gitlab.com/gitlab-com/gl-infra/infrastructure/-/issues/10625

This kind of review could become part of production readiness review performed prior to moving workloads to k8s.

see also: see also: https://kubernetes.io/docs/concepts/workloads/pods/disruptions/

consider doing "Chaos Monkey" style testing:
- kill a node abruptly (e.g. trigger a kernel panic)
- kill a pod (e.g. delete a pod)
- use chaos engineering tools/frameworks, for example:
  - https://github.com/litmuschaos/litmus
  - https://www.gremlin.com/docs/
  - chaoskube
  - kube-monkey

Edited Jun 26, 2020 by Michal Wasilewski