CI/CD Jobs seem to run out of memory and use disk swap
Problem
While tracking timeouts for CI/CD jobs in the gitlab-org/gitlab
project, we noticed that some RSpec CI/CD jobs were becoming very slow after some time, until the job timed out:
- https://gitlab.com/gitlab-org/gitlab/-/jobs/5722368875
- https://gitlab.com/gitlab-org/gitlab/-/jobs/5721930364
The pattern seems to be that the CI/CD runner is running out of memory, and switching to swap on disk (see comment):
This pattern looks like running out of memories, hitting into swap so everything following up became extremely slow. Should we also log the memory?
Goal
- Add the necessary monitoring/profiling to diagnose the root cause of this issue
- Mitigate the issue
Edited by David Dieulivol