Update ase.geometry.rdf.get_rdf to accept chemical symbols as elements and deprecate ase.geometry.analysis.Analysis.get_rdf due to its bugs

Summary

While ase.geometry.analysis.Analysis.get_rdf is a wrapper of ase.geometry.rdf.get_rdf with additional handling of elements, this handling causes a bug for partial RDF calculations. Specifically, elements=['C', 'H'] in ase.geometry.Analysis.get_rdf computes not a partial RDF of C-H but the full RDF of the partial system with C and H, thus including also C-C and H-H contributions. To compute the partial RDF correctly, we need to give atomic numbers elements=[6, 1], which has however another issue described below.

ase.geometry.rdf.get_rdf calculates partial RDFs correctly (in ASE 3.27.0, after !3921 (merged)). This has however a limitation that elements can so far take only atomic numbers like [6, 1] and not ['C', 'H'], despite its name elements.

The present MR addresses these two issues by:

  1. Updating ase.geometry.rdf.get_rdf so that it can take a list of chemical symbols for elements
  2. Deprecating / removing ase.geometry.analysis.Analysis.get_rdf due to its bugs

Thank you so much @drFaustroll for reporting the issue!

@askhl May I shortly ask your opinion, particulalry about the deprecation and the removal of ase.geometry.Analysis.get_rdf?

Since the changes are rather straightforward, let me set it as a milestone for ASE 3.28.0.

Details

The bugs of ase.geometry.analysis.Analysis.get_rdf comes from here:

                    for element in elements:
                        for idx in self._get_symbol_idxs(image, element):
                            tmp_image.append(image[idx])

and then

            rdf = get_rdf(tmp_image, rmax, nbins, elements=el, no_dists=(not return_dists),
                          volume=volume)

where el still remains None. This means that, when we set e.g. elements=['C', 'H'], this method considers a partial system containing only C and H, but then it considers the full RDF for this partial system, which contains contributions also from C-C and H-H.

The partial RDF can be obtained with elements=[6, 1]. However, this also has a potential issue. Here

                if all(isinstance(x, int) for x in elements):
                    if len(elements) == 2:
                        # use builtin get_rdf mask
                        el = elements
                        tmp_image = image
                    else:
                        # create dummy image
                        tmp_image = Atoms(
                            cell=image.get_cell(), pbc=image.get_pbc())
                        for idx in elements:
                            tmp_image.append(image[idx])

This considers a partial RDF if the length of elements is equal to 2; in this case elements are interpreted as an atomic number pair. On the other hand, if the length of elements differs from 2, this is recognized as indices of atoms, and the full RDF for the partial system specified by the indices is computed. It is, however, in principle possible that we want to give two items for elements as indices of atoms. In the present design, this is misinterpeted as two atomic numbers.

So, there are multiple bugs or issues in ase.geometry.analysis.Analysis.get_rdf. Also, since the original implementation (!881 (merged)), this part has not been tested.

Also, we can replace most special usages for ase.geometry.analysis.Analysis.get_rdf with ase.geometry.rdf.get_rdf with ASE-standardized manners. For example, to select one configuration from a list of Atoms (images), instead of

analysis = Analysis(images)
rdf = analysis.get_rdf(rmax, nbins, imageIdx=2)

we can do

rdf = get_rdf(images[2], rmax, nbins)

Also, to compute a full RDF for a partial system, we can do

rdf = get_rdf(atoms[atoms.symbols.search({'C', 'H'})], rmax, nbins)

Having these in mind, I think we can reasonably deprecate ase.geometry.analysis.Analysis.get_rdf and soon remove it (rather quickly, maybe ASE 3.29.0).

Checklist: x (done) or ~ (irrelevant)

Merge request reports

Loading