Gitter MongoDB high CPU usage
Problem
We are getting PagerDuty alerts like this
mongo-replica-01: loadavg(5min) of 2.3 matches resource limit [loadavg(5min)<2.0]
https://gitter.pagerduty.com/incidents/PEPXZE7
At the beginning these warnings were very sparse (once a few days) but over last few weeks they are increasing in nubers (roughly dozen or so a day).
Last Friday there was a hiccup when the site stopped responding for a minute https://gitlab.com/gitlab-com/gl-infra/infrastructure/-/issues/10181 which indicates that these alerts can result in production outages.
Analysis
The CPU usage grew slightly around 22nd of April and it seems that it increased today even further:
(light purple is CPU cycles spent on iowait)
Last 3 months
All CPU stats
IOWait
https://app.datadoghq.com/dash/host/52505732?from_ts=1591155757659&to_ts=1591159357659&live=true
It seems that IOWait is the main CPU usage that contributes to the increased load in the last month.
Today 2020-05-19
Next steps
- Investigate wheteher there are deployments that corellate with the increase in CPU usage
- Profile the DB to find out what queries are causing the increased load
- Could we send some read queries to the replica which si not using almost any CPU?