mm/mglru: Revert "don't sync disk for each aging cycle" (!4523) · Merge requests · Red Hat / centos-stream / src / kernel / centos-stream-9

Waiman Long requested to merge llong1/centos-stream-9:rhel-43371_mglru into main Jun 21, 2024

JIRA: https://issues.redhat.com/browse/RHEL-43371
Upstream Status: RHEL only

Since the 9.4 mm update to upstream v6.1, it was found that premature OOM kills happened much more frequently for tasks in a memory constrained cgroup. As shown in the Jira ticket, one easy way to reproduce it to write a large amount of random data to a NFS mounted filesystem.

After doing bisection, the culprit is found to be commit 14aa8b2d5c2e ("mm/mglru: don't sync disk for each aging cycle"). Upstream also had some discussions about this premature OOM problem in [1]. The purpose of this commit is to prevent SSD wearout as it may breach the rate limit a system wants to impose on writeback.

However this can cause a serious problem for the OCP environment where most of the containers are under control by memory cgroup. Premature OOM kills will have a great impact on OCP reducing its reliability and stability. Revert this problematic commit for now so that wakeup_flusher_threads() will be called again on every generation bump. This will be reverted once upstream comes up with a better fix.

Before the patch, the reproducer shown in the Jira ticket will run once successfully and get OOM-killed in the 2nd run. The write data rate on a certain test system was:

2147483648 bytes (2.1 GB, 2.0 GiB) copied, 57.5474 s, 37.3 MB/s

After applying this patch, the reproducer could be run multiple times without causing OOM kill. The new write data rate was:

2147483648 bytes (2.1 GB, 2.0 GiB) copied, 25.694 s, 83.6 MB/s

The write throughput increased by more than double.

By disabling MGLRU (CONFIG_LRU_GEN=n), the write data rate was:

2147483648 bytes (2.1 GB, 2.0 GiB) copied, 21.184 s, 101 MB/s

This has even better throughput. So some improvement may still be needed in the MGLRU code to match the non-MGLRU environment.

[1] https://lore.kernel.org/lkml/ZcWOh9u3uqZjNFMa@chrisdown.name/

Signed-off-by: Waiman Long longman@redhat.com

mm/mglru: Revert "don't sync disk for each aging cycle"

Merge request reports