Fix wrapping-clusters bug and calculate change in cluster vector instead of two local cluster vectors
In MC simulations, mchammer relies on precalculated information to calculate a change in energy from a Monte Carlo step. This has been done with a "local cluster vector" calculation, which has calculated the terms in the cluster vector involving a specific site. However, for each change in occupation, this local cluster vector has been calculated twice, once for the occupations before the change and once after the change. With this MR, we instead calculate the change in cluster vector, which only needs to be done once for each flip.
Since I can't see that the local cluster vector (which in any case is quite hard to define) is useful outside the scope where it was used, and to keep the code simple, I deleted all functions related to the previous strategy.
Changes involving more than one site
When more than one site is changed (such as in a canonical swap), icet previously did two local cluster vector calculations per site, and there was a list keeping track of previously considered sites so as to avoid double counting. This strategy is changed, and now the change in energy is instead calculated in steps, i.e., the energy change is calculated step by step (one step per site where occupation has changed). IMO this simplifies the logic somewhat.
Benchmarking
I've used the benchmark/benchmark_ce_calculator.py script to benchmark the new implementation. Optimally, the change would cut the time spent in the energy change calculation by half. I don't quite get that, but it seems like I get closer when the size of the system increases. Here's a list of speedups for a bunch of different CEs.
# Col 1: Number of atoms
# Col 2: Speedup
binary, 26 clusters
64 15.82%
216 20.45%
512 28.28%
1000 31.41%
2744 40.53%
4096 42.88%
8000 44.19%
longercutoffs, 139 clusters
64 8.22%
216 13.46%
512 16.33%
1000 26.32%
sublattices, 129 clusters
128 40.73%
432 43.33%
1024 44.11%
2000 48.69%
5488 46.05%
ternary, 111 clusters
64 -3.43%
216 3.44%
512 20.13%
1000 30.09%
2744 33.22%
Since this calculation is often the majority of an MC calculation, this MR can cut quite some time in many simulations.
Fixing a bug
During review, a bug was discovered which made the energy change calculation give erroneous results when triplets or higher order clusters "self-interacted" via periodic boundary conditions. The issue can be illustrated with an example.
Let's say you have a supercell with symmetrically equivalent sites 1, 2 and 3. I could then, for example, have an orbit with clusters like these:
1 2 3 (cluster 1)
2 3 1 (cluster 2)
3 1 2 (cluster 3)
If I want to calculate a local contribution when flipping the identity of one of these atoms, I need to take all three clusters into account (since each atom appears in all three of the clusters), so all three clusters will be part of all three local orbit lists. This works. So far so good.
Let's now consider the same orbit but in a "supercell" with only 1 atom. In that case there is only 1 cluster (1 1 1) that wraps onto itself, but the code will actually have three clusters in the local orbit list:
1 1 1 (cluster 1)
1 1 1 (cluster 2)
1 1 1 (cluster 3)
This cluster will thus get triple-counted when calculating the local cluster vector/change in cluster vector.
Solution
The best solution I could find to this problem without too significant rewrites, was to make _clusterCounts calculate clusters in doubles instead of integers. Specifically, I can then count the clusters in the above example as only 1/3 each (1 divided by the number of occurences of the site where the cluster vector change should be calculated). This is perhaps not a very elegant solution, and I would have preferred to somehow include this in the multiplicity calculation, but the problem is that the multiplicity is calculated for a full orbit, while this multiplicity is specific to clusters and differs between clusters in the same orbit. I also tried to remove clusters but it seems like a make an error then.
I was a bit concerned this change would cause degraded performance but it seems like the impact is negligible.
Significant changes
- I have added a function
get_cluster_vector_changetoClusterExpansionCalculator.cpp. This function is now used to calculate change during MC simulations. - I kept
get_local_cluster_vectoron polite request from @erikfransson, but removed theexclude_indiceskeyword which caused problems; I could not get the MC simulations work with the local cluster vector calculation because of this keyword (I think), but it seems unnecessary now anyway. - I refactored
ClusterExpansionCalculator.cppto get rid of repeated code. - I changed
ClusterCounts.cppto be able to count "partial clusters" (as described above). (This change also effectively does what was proposed in !511 (closed), which I will now close.) - I completely rewrote
tests/unittest/test_mchammer/test_cluster_expansion_calculator.pywith thepytestframework. The file now has considerably fewer lines but the tests are much more extensive (and would have caught the bug). A downside is that they take something like 90 seconds to run, which is maybe 50 more than before. - Some (automatic) styling of the C++ code.