In MC simulations, mchammer relies on precalculated information to calculate a change in energy from a Monte Carlo step. This has been done with a "local cluster vector" calculation, which has calculated the terms in the cluster vector involving a specific site. However, for each change in occupation, this local cluster vector has been calculated twice, once for the occupations before the change and once after the change. With this MR, we instead calculate the change in cluster vector, which only needs to be done once for each flip.

Since I can't see that the local cluster vector (which in any case is quite hard to define) is useful outside the scope where it was used, and to keep the code simple, I deleted all functions related to the previous strategy.

Changes involving more than one site

When more than one site is changed (such as in a canonical swap), icet previously did two local cluster vector calculations per site, and there was a list keeping track of previously considered sites so as to avoid double counting. This strategy is changed, and now the change in energy is instead calculated in steps, i.e., the energy change is calculated step by step (one step per site where occupation has changed). IMO this simplifies the logic somewhat.

Benchmarking

I've used the benchmark/benchmark_ce_calculator.py script to benchmark the new implementation. Optimally, the change would cut the time spent in the energy change calculation by half. I don't quite get that, but it seems like I get closer when the size of the system increases. Here's a list of speedups for a bunch of different CEs.

# Col 1: Number of atoms
# Col 2: Speedup

binary, 26 clusters
   64   15.82%
  216   20.45%
  512   28.28%
 1000   31.41%
 2744   40.53%
 4096   42.88%
 8000   44.19%

longercutoffs, 139 clusters
   64    8.22%
  216   13.46%
  512   16.33%
 1000   26.32%

sublattices, 129 clusters
  128   40.73%
  432   43.33%
 1024   44.11%
 2000   48.69%
 5488   46.05%

ternary, 111 clusters
   64   -3.43%
  216    3.44%
  512   20.13%
 1000   30.09%
 2744   33.22%

Since this calculation is often the majority of an MC calculation, this MR can cut quite some time in many simulations.

Fixing a bug

During review, a bug was discovered which made the energy change calculation give erroneous results when triplets or higher order clusters "self-interacted" via periodic boundary conditions. The issue can be illustrated with an example.

Let's say you have a supercell with symmetrically equivalent sites 1, 2 and 3. I could then, for example, have an orbit with clusters like these:

1 2 3 (cluster 1)
2 3 1 (cluster 2)
3 1 2 (cluster 3)

If I want to calculate a local contribution when flipping the identity of one of these atoms, I need to take all three clusters into account (since each atom appears in all three of the clusters), so all three clusters will be part of all three local orbit lists. This works. So far so good.

Let's now consider the same orbit but in a "supercell" with only 1 atom. In that case there is only 1 cluster (1 1 1) that wraps onto itself, but the code will actually have three clusters in the local orbit list:

1 1 1 (cluster 1)
1 1 1 (cluster 2)
1 1 1 (cluster 3)

This cluster will thus get triple-counted when calculating the local cluster vector/change in cluster vector.

Solution

The best solution I could find to this problem without too significant rewrites, was to make _clusterCounts calculate clusters in doubles instead of integers. Specifically, I can then count the clusters in the above example as only 1/3 each (1 divided by the number of occurences of the site where the cluster vector change should be calculated). This is perhaps not a very elegant solution, and I would have preferred to somehow include this in the multiplicity calculation, but the problem is that the multiplicity is calculated for a full orbit, while this multiplicity is specific to clusters and differs between clusters in the same orbit. I also tried to remove clusters but it seems like a make an error then.

I was a bit concerned this change would cause degraded performance but it seems like the impact is negligible.

Significant changes

I have added a function get_cluster_vector_change to ClusterExpansionCalculator.cpp. This function is now used to calculate change during MC simulations.
I kept get_local_cluster_vector on polite request from @erikfransson, but removed the exclude_indices keyword which caused problems; I could not get the MC simulations work with the local cluster vector calculation because of this keyword (I think), but it seems unnecessary now anyway.
I refactored ClusterExpansionCalculator.cpp to get rid of repeated code.
I changed ClusterCounts.cpp to be able to count "partial clusters" (as described above). (This change also effectively does what was proposed in !511 (closed), which I will now close.)
I completely rewrote tests/unittest/test_mchammer/test_cluster_expansion_calculator.py with the pytest framework. The file now has considerably fewer lines but the tests are much more extensive (and would have caught the bug). A downside is that they take something like 90 seconds to run, which is maybe 50 more than before.
Some (automatic) styling of the C++ code.

Edited Sep 17, 2021 by Magnus Rahm

Fix wrapping-clusters bug and calculate change in cluster vector instead of two local cluster vectors

Changes involving more than one site

Benchmarking

Fixing a bug

Solution

Significant changes

Merge request reports