c2db: More intelligent parallelization
Currently we use the default parallelization of GPAW. The heuristic was written back when 16 cores was a lot.
Sometimes it chooses relatively bad parallelization (e.g. very little kpoint parallelization).
What we should do is to write a function that returns a parallelization that's non-bad for all materials. The function can do a "dry run" using gpaw.calcinfo to see what symmetry reduction/kpoints GPAW would choose, then decide parallelization based on how many cores are there.
Or we can add a choose_parallelization() callback to GPAW so we inject a call to our implementation organically (that's probably worth having, generally speaking, even though the calcinfo functionality serves to make callbacks less necessary).
Considerations:
- Priority of kpoints (and rounding errors/waste) vs domain parallelization? This is the main question since currently we sometimes get very little kpoint parallelization.
- Enable some band parallelization if there are many cores compared to problem size? This is secondary.
- Enable scalapack? (E.g. for Davidson) Probably not important for C2DB.
- Enable augment_grids (I keep forgetting: What do we have in terms of augment_grids in GPAW new)?