Skip to content

debug a case of CPU memory leak during training #36

On code with huge datasets some users have observed a bug on training due to memory leak and huge CPU RAM consumption. See issue #36 (closed) for detail. On a binary segmentation case this bug has been successfully reproduced and observed with a minimal test code with test_dataloader_perf.py. This code has been add to Odeon test to help checking and testing/analyzing other issues with training loop.

It's seems the bug is due to Rotation90 transform and the fact that it use numpy.rot90 which return a view and not a new array. The use of view seems to lead to memory leak during training, perhaps due to less easy garbage collection.

So when explicitly using/made copy of data inside the transform function the memory leak disappear. It is perhaps depending on torch/numpy versions but this has not been intensively test with different torch/numpy config.

Merge request reports