Increase Prometheus node resources

Summary

Prometheus pods are frequently being OOMkilled due to resource constraints, WAL replays take too long and often lead to missing metries due to both pods being affected.

Related Incident(s)

Originating issue(s): production#6408 (closed)

Desired Outcome/Acceptance criteria

Add new highmem nodepool to gprd
Prometheus Pod Disruption Budget

Upgrade nodes which Prometheus is allocated to.

Associated Services

Prometheus

Corrective Action Issue Checklist

link the incident(s) this corrective action arose out of
give context for what problem this corrective action is trying to prevent from re-occurring
assign a severity label (this is the highest sev of related incidents, defaults to 'severity::4')
assign a priority (this will default to 'priority::4')

Edited Mar 01, 2022 by Steve Xuereb