Create Runbook for Handling Capacity Warnings
We often will get a variety of capacity warnings that need to be addressed. I would like to create a runbook to help decide how to handle any given capacity alert. Some of the below items overlap, but I'd like to see something like the following.
- How to address different capacity warnings
- How to decide if we should increase resources, if something else (like a code or config change) is the source of the warning, or if no action should be taken?
- How to determine what an acceptable and safe increase to resources is? What dashboards/graphs/metrics should we examine to make sure we don't start getting evictions?
I will probably think of other things later, but here is my initial list.
Edited by Alex Hanselka