CPU slowly climbing to 100%
Hello! We have been migrating from mod_wsgi to pyruvate over the last few months. Monday, 2 days ago, we went live with the largest application. It's running under python 3.9.
By Wednesday morning, the Client was reporting 10+ second page load times. On a c5.2xlarge instance, running 10 processes (docker containers) with 15 worker threads (2nd argument to pyruvate), the server was completely tapped out on CPU, all cores, with only about 25 requests per second.
Restarting the docker containers and the CPU immediately went down to 1/100th of that.
Over the next hour or two, one process at a time jumped to 100%
Out of curiosity, I went to look at another application we had deployed on pyruvate. Indeed, see these images before and after restart:
Normally I wouldn't be starting here, but these factors lead me in the pyruvate direction:
- Before Monday, same code, same traffic, and we had no CPU issues
- Our other docker containers (same exact image, same python imports (except pyruvate), same application) running background tasks do not see a CPU increase.
I am at a loss as to how to look into this deeper. It seems as if something spawns a thread that just goes crazy (tight loop perhaps) and never exits, and while the server keeps running fine, it's just much, much slower.
At this point I have mitigated the issue by an hourly container restart, which is not ideal.
Please let me know if anything comes to mind that I could check, try, or debug to isolate this issue.
Also, if someone on the team wishes to engage with me on it, we would pay commercial rates for the help.
Thank you!

