ENH: Fix `MPI` class RecursionError w multiprocessing
Hi, the use of the multiprocessing
module with NEB
s leads to RecursionError
, as shown in the minimal example below.
Using pickle
on NEB
s also gives the same error - I added another minimal example below for this. These problems when using multiprocessing
and pickle
seems to be due to __getattr__
being called on an empty instance of MPI
, see for example, https://stackoverflow.com/a/34950256 and https://stackoverflow.com/a/50888571.
The RecursionError
can be simply solved by checking if __getattr__
is trying to access the __setstate__
attribute, and raising an AttributeError
if so. This merge request implements this fix.
Thanks for considering this!
Minimal example - RecursionError with multiprocessing
#!/usr/bin/env python
from ase import Atoms
from ase.neb import NEB
from multiprocessing import Pool
def print_neb_images(neb):
print(neb.images)
images = [Atoms('H')] * 3
nebs = [NEB(images)] * 3
with Pool(2) as p:
p.map(print_neb_images, nebs) # raises RecursionError
Output
Process ForkPoolWorker-2:
Traceback (most recent call last):
File "/home/benjamin/miniconda3/envs/test/lib/python3.7/multiprocessing/process.py", line 297, in _bootstrap
self.run()
File "/home/benjamin/miniconda3/envs/test/lib/python3.7/multiprocessing/process.py", line 99, in run
self._target(*self._args, **self._kwargs)
File "/home/benjamin/miniconda3/envs/test/lib/python3.7/multiprocessing/pool.py", line 110, in worker
task = get()
File "/home/benjamin/miniconda3/envs/test/lib/python3.7/multiprocessing/queues.py", line 354, in get
return _ForkingPickler.loads(res)
File "/home/benjamin/miniconda3/envs/test/lib/python3.7/site-packages/ase/parallel.py", line 92, in __getattr__
if self.comm is None:
File "/home/benjamin/miniconda3/envs/test/lib/python3.7/site-packages/ase/parallel.py", line 92, in __getattr__
if self.comm is None:
File "/home/benjamin/miniconda3/envs/test/lib/python3.7/site-packages/ase/parallel.py", line 92, in __getattr__
if self.comm is None:
[Previous line repeated 489 more times]
RecursionError: maximum recursion depth exceeded
Minimal example - RecursionError with pickle
#!/usr/bin/env python
import pickle
from ase import Atoms
from ase.neb import NEB
images = [Atoms('H')] * 3
nebs = [NEB(images)] * 3
pickled_nebs = pickle.dumps(nebs)
pickle.loads(pickled_nebs) # raises RecursionError
Merge request reports
Activity
Thanks @chenb for carefully documenting and fixing this. While this fix is fine, I thought it even better to remove the code that causes the problem since we don't need these warnings forever (it has been there for some years), and it's always nice to reduce the amount of code we have. See !2698 (merged). I'll close this MR since the problem was resolved in the other merge request. Thanks again.
Hi @askhl, thank you very much for going through this MR and for proposing an alternate fix (agree that reducing code is always good!). I tried !2698 (merged) out but it didn't seem to work for me, still producing the
RecursionError
with the examples above. I believe the error is from the__getattr__
method of theMPI
class instead of theParallelModuleWrapper
class that was removed. I might be missing something, please feel free to correct me if I'm wrong. Thanks!added 1 commit
- a94df367 - Add comment explaining changes to MPI.__getattr__()
Thanks @chenb!
mentioned in commit 0dc88e5b
By the way @chenb, I would not in general rely on transferring complex objects to subprocesses since many things can go wrong. NEB can contain one or more file handles, for example, which are fundamentally unpicklable, as is a real-life MPI communicator.
I recommend relying on passing the inputs to the subprocesses, and those inputs are then turned into actual objects inside the subprocess:
def myfunction(simple_args): calc = ... neb = NEB(...) pool.map(myfunction, ...)
That way, the code will function correctly no matter whether NEB objects and calculators are picklable now or in the future.
Thanks for sharing this useful design pattern, @askhl! I agree that this would be a more robust solution, unfortunately the NEBs are buried quite deep in my code e.g. I want to use multiprocessing on a Class A containing Class B containing NEBs, so it would require a long list of arguments to reconstruct everything.. Will keep an eye out to see if I can apply this in the future.