Record container RSS, as well as working set, for memory saturation
The container's working set memory (as reported by cgroups; this is not
necessarily the same as the kernel's workingset_size
metric) is not a
reliable indicator for memory saturation, as it includes pages from the
filesystem cache that are permitted to be evicted rather than OOM kill
the cgroup.
This also permits the use of resident set size (RSS). This is still not exactly what is used by the OOM killer - the thing that we're ultimately trying to approximate - but it's a lot clearer what we're measuring, and how our application can influence it. In practice, it's also a more stable metric than the working set size.
However, it does have a problem when combined with use of memory marked
as MADV_FREE
by an madvise
call (lazy-free memory). Lazy-free memory
is memory that the process no longer needs, but wishes to be able to
reclaim without a page fault. Lazy-free memory is included in RSS, but
not WSS, so processes that use lazy-free memory may have dramatic
overestimates for RSS compared to WSS.
For now, we just opt in the GitLab Rails deployments (api, git, internal-api, sidekiq, and web) to using RSS alongside WSS, to trial this approach.
Much, much more detail in the below issue: gitlab-com/gl-infra/scalability#2024 (closed)