Skip to content

Clarifying how ASE deals with disordered atoms when reading in .cif files

The current cif.py file reads .cif files absolutely fine, however .cif files can contain disorder in them. This can occur in parts of the crystal where atoms can be positioned in numerous places, such as long chain aliphatic -(CH2)x-CH3 groups. The various positions of disordered atoms observed by the crystallographer are often recorded to the .cif file. This results in disordered atoms that are recorded multiple times in the .cif file (in multiple positions).

When ASE currently reads .cif files that have disordered atoms, all atoms are read, including those disordered atoms that are recorded multiple times in the .cif file in various positions. This results in very odd looking molecules in the crystal that contains more atoms than it should have.

This update includes methods that use either:

  • the _atom_site_disorder_group and _atom_site_disorder_assembly blocks given in the .cif file (if these blocks are given in the .cif file), or
  • the _atom_site_label block given in the .cif file

This allows cif.py to give a non-disordered ase.Atoms object.

The user is able to tell the ase.io.read method how to read .cif files to give a non-disordered crystal using the disorder_groups variable:

system = read('RAPHID.cif',disorder_groups=disorder_groups)

Possible options for disorder_groups:

  • disorder_groups=-1 (Default): Read in all atoms from the .cif file, including all disordered atoms. Currently is what the original cif.py file is already doing.
  • disorder_groups='remove_disorder': Remove all disordered atoms in the crystal. This setting is the recommended setting if you do not want disorder in your crystal when read by ASE.
  • disorder_groups='tag_disorder': Tag all disordered atoms in the crystal. This setting is the recommended setting if you want to tag all the disordered atoms in your crystal when read by ASE.
  • disorder_groups=-2: Read in all non-disordered atoms, as well as atoms that are associated with the lowest value of _atom_site_disorder_group in the crystal.
  • disorder_groups = an integer above and including 0: Read in all non-disordered atoms, as well as atoms that are associated with the disorder_groups _atom_site_disorder_group in the crystal.
  • disorder_groups = List of integers that are above and including 0: Read in all non-disordered atoms, as well as atoms that are associated with any d _atom_site_disorder_group in the crystal that has been given in the disorder_groups list (where disorder_groups = [d for d in disorder_groups]).
  • disorder_groups = List of str.: Read in all non-disordered atoms, as well as atoms that are associated with the _atom_site_disorder_assembly and _atom_site_disorder_group values that you want to read in. For example: ['A1','B2','C1']
  • disorder_groups='tag_?': Tag all atoms as 0 except for those with a ? at the start of end of the atom label, which are tagged as 1.
  • disorder_groups='remove_?': Keep all atoms except for those with a ? at the start of end of the atom label, which are removed.

Tests provided in a zip file below in the comments.

New Methods:

  • _get_atom_indices_in_disorder_group(self, disorder_groups) <-- (major new method)
  • _tag_or_remove_atoms_with_question_mark_in_their_labels(self, operator) <-- (major new method)
  • _does_cif_file_contain_atom_site_disorder_group_block(self)
  • _get_entries_in_atom_site_disorder_group_block(self)
  • _does_cif_file_contain_atom_site_disorder_assembly_block(self)
  • _get_entries_in_atom_site_disorder_assembly(self)

Modified Methods:

  • read_cif
  • get_atoms
  • get_unsymmetrized_structure

References:

Merge request reports