Skip to content

Fix fork issue with py-lmdb-1.7.0+ / Fix for pytorch dataloader with mutiple workers

If any python process uses an aselmdb backend and that process forks, its child processes using the same aselmdb/lmdb_env object can cause a segmentation fault. This segmentation fault is unavoidable when trying to train models with pytorch dataloaders when num_workers>0.

Aside from the current segmentation fault this maybe was also causing a memory leak during training with pytorch dataloaders + aselmdb.

LMDB env are not compatible with fork() , http://www.lmdb.tech/doc/ "Use an MDB_env* in the process which opened it, without fork()ing.".

However we have been able to use this with readonly databases until this line was changed and now causes a segmentation fault https://github.com/jnwatson/py-lmdb/blob/master/lmdb/cpython.c#L3223

This PR fixes the issue, by adding a gate keeper property around the lmdb_env. Each time the getter compares pids to make sure this process does in fact own the lmdb_env, if not open it again. This makes it so that any fork() of the python process such as that used in pytorch dataloaders will never share lmdb envs between processes.

Edited by misko

Merge request reports

Loading