Skip to content
Snippets Groups Projects

ENH: Fix `MPI` class RecursionError w multiprocessing

Merged Benjamin requested to merge (removed):multiprocess_fix into master

Hi, the use of the multiprocessing module with NEBs leads to RecursionError, as shown in the minimal example below. Using pickle on NEBs also gives the same error - I added another minimal example below for this. These problems when using multiprocessing and pickle seems to be due to __getattr__ being called on an empty instance of MPI, see for example, https://stackoverflow.com/a/34950256 and https://stackoverflow.com/a/50888571.

The RecursionError can be simply solved by checking if __getattr__ is trying to access the __setstate__ attribute, and raising an AttributeError if so. This merge request implements this fix.

Thanks for considering this!

Minimal example - RecursionError with multiprocessing

#!/usr/bin/env python

from ase import Atoms
from ase.neb import NEB
from multiprocessing import Pool

def print_neb_images(neb):
    print(neb.images)

images = [Atoms('H')] * 3
nebs = [NEB(images)] * 3

with Pool(2) as p:
    p.map(print_neb_images, nebs) # raises RecursionError

Output

Process ForkPoolWorker-2:
Traceback (most recent call last):
  File "/home/benjamin/miniconda3/envs/test/lib/python3.7/multiprocessing/process.py", line 297, in _bootstrap
    self.run()
  File "/home/benjamin/miniconda3/envs/test/lib/python3.7/multiprocessing/process.py", line 99, in run
    self._target(*self._args, **self._kwargs)
  File "/home/benjamin/miniconda3/envs/test/lib/python3.7/multiprocessing/pool.py", line 110, in worker
    task = get()
  File "/home/benjamin/miniconda3/envs/test/lib/python3.7/multiprocessing/queues.py", line 354, in get
    return _ForkingPickler.loads(res)
  File "/home/benjamin/miniconda3/envs/test/lib/python3.7/site-packages/ase/parallel.py", line 92, in __getattr__
    if self.comm is None:
  File "/home/benjamin/miniconda3/envs/test/lib/python3.7/site-packages/ase/parallel.py", line 92, in __getattr__
    if self.comm is None:
  File "/home/benjamin/miniconda3/envs/test/lib/python3.7/site-packages/ase/parallel.py", line 92, in __getattr__
    if self.comm is None:
  [Previous line repeated 489 more times]
RecursionError: maximum recursion depth exceeded

Minimal example - RecursionError with pickle

#!/usr/bin/env python

import pickle
from ase import Atoms
from ase.neb import NEB

images = [Atoms('H')] * 3
nebs = [NEB(images)] * 3

pickled_nebs = pickle.dumps(nebs)
pickle.loads(pickled_nebs) # raises RecursionError
Edited by Benjamin

Merge request reports

Loading
Loading

Activity

Filter activity
  • Approvals
  • Assignees & reviewers
  • Comments (from bots)
  • Comments (from users)
  • Commits & branches
  • Edits
  • Labels
  • Lock status
  • Mentions
  • Merge request status
  • Tracking
  • Benjamin changed the description

    changed the description

  • Thanks @chenb for carefully documenting and fixing this. While this fix is fine, I thought it even better to remove the code that causes the problem since we don't need these warnings forever (it has been there for some years), and it's always nice to reduce the amount of code we have. See !2698 (merged). I'll close this MR since the problem was resolved in the other merge request. Thanks again.

  • Author Developer

    Hi @askhl, thank you very much for going through this MR and for proposing an alternate fix (agree that reducing code is always good!). I tried !2698 (merged) out but it didn't seem to work for me, still producing the RecursionError with the examples above. I believe the error is from the __getattr__ method of the MPI class instead of the ParallelModuleWrapper class that was removed. I might be missing something, please feel free to correct me if I'm wrong. Thanks!

  • reopened

  • Oh, I see.

    Then let's merge this. However, we'll first need a comment next to the added line which mentions the problem and has a link to this MR. Without a comment, it is not possible to understand why that line is there, and then a rational human reader would remove it again.

  • Benjamin added 1 commit

    added 1 commit

    • a94df367 - Add comment explaining changes to MPI.__getattr__()

    Compare with previous version

  • Author Developer

    That makes sense, I have now included some comments and linked them back to this MR. Hope the comments are clear enough.

  • mentioned in commit 0dc88e5b

  • By the way @chenb, I would not in general rely on transferring complex objects to subprocesses since many things can go wrong. NEB can contain one or more file handles, for example, which are fundamentally unpicklable, as is a real-life MPI communicator.

    I recommend relying on passing the inputs to the subprocesses, and those inputs are then turned into actual objects inside the subprocess:

    def myfunction(simple_args):
        calc = ...
        neb = NEB(...)
    
    pool.map(myfunction, ...)

    That way, the code will function correctly no matter whether NEB objects and calculators are picklable now or in the future.

  • Author Developer

    Thanks for sharing this useful design pattern, @askhl! I agree that this would be a more robust solution, unfortunately the NEBs are buried quite deep in my code e.g. I want to use multiprocessing on a Class A containing Class B containing NEBs, so it would require a long list of arguments to reconstruct everything.. Will keep an eye out to see if I can apply this in the future.

Please register or sign in to reply
Loading