Inefficient Supercell Generation Functions
Hi! Thanks for all your work in building and maintaining such a useful package.
I wanted to raise an issue I noticed when trying to build improved supercell generation algorithms for our defects code doped
. The main point is that the cell shape metric function (get_deviation_from_optimal_cell_shape
) and thus supercell-finding function (find_optimal_cell_shape
) in ase.build.supercells
has some issues, namely:
-
Not accounting for rotational invariance (i.e. assuming cell vectors are aligned along the diagonal), as demonstrated in this example:
i.e.
[0, 1, 0], [0, 0, 1], [1, 0, 0]
gives a bad score with the currentget_deviation_from_optimal_cell_shape
, but should actually be 0 (ideal) as it is with our updated metric. This is included in the notebook attached to this issue. -
Doesn't allow transformation matrices with negative determinants:
-
While relatively efficient (which is important given the relatively intense numerical optimisation required for searching over many potential supercell matrices), the code in
find_optimal_cell_shape
can be made much much faster (by a few orders of magnitude), reducing loop times for searching over supercells for particularly tricky systems from ~hours to seconds / couple minutes.
Fixing these issues (currently implemented in cell_metric
in doped.utils.supercells
), you get a major improvement in the optimality measures and supercell minimum image distances:
Worth mentioning; as detailed on the ASE defects docs page where these functions are described, the primary target for the output supercells is usually actually maximising the minimum image distance between cells, and these cell shape metrics are essentially an indirect way of searching for the supercell that maximises the image distance for a given number of atoms. Directly optimising on the minimum image distance has been avoided before because it can be computationally complex and usually extremely costly. However, we've built on these approaches in doped
to implement this in an efficient way, allowing this direct optimisation to identify the most optimal supercell for an arbitrary input cell with a given minimum image distance constraint (minimum number of atoms and other desired user constraints can be included), which gives a significant improvement for this goal:
(Figure taken from the arXiv paper which shows the average minimum periodic image distance, normalised by the ideal image distance, for a range of crystal systems)
I wanted to submit an issue here first to check before submitting a merge request; I can update the ase
code and docs to use this fixed cell metric if you like?
Also a minor point to note, on the ASE defects docs page (which gives a nice description of this minimum image distance issue!), it says for a 2:1 rectangle the minimum distance is 0.5a_0
, but it's actually 1/\sqrt{2} a_0
where a_0
= \sqrt{A}
(the effective cubic length) as stated. And for 3D FCC, r_1
should be the 1/^6\sqrt{2} a_0 = 1.12246 a_0
(i.e. greater than a_0
as expected), rather than the quoted \sqrt{3}/2 a_0 = 0.866 a_0
which would imply it's worse than simple cubic.
ASE_Supercell_Generation_Demo.ipynb ASE_Supercell_Generation_Demo.pdf