AWH bias and PMF output can get out of sync with multiple workers
When using AWH to steer FEP the PMF should follow the bias completely. With multiple workers this is not the case, more noticable the more workers. The error is probably related to the communication rather than the AWH FEP implementation as such, but exposed since it would be very difficult to notice when using AWH with pulling, since the PMF and bias are usually different in that case. Example AWH output (PMF in second column and bias in third) running one worker:
0.0000 0 0 1.02802 1.02828 1 0.249836
1.0000 17.8363 17.8363 1.03562 1.03238 1 0.183354
2.0000 30.0102 30.0102 1.03875 1.03816 1 0.156207
3.0000 37.5822 37.5822 1.0348 1.03981 1 0.178732
4.0000 41.8373 41.8373 1.04195 1.04078 1 0.159754
5.0000 43.7604 43.7604 1.04466 1.04053 1 0.170852
6.0000 50.0838 50.0838 1.03752 1.04158 1 0.221154
7.0000 56.0813 56.0813 1.03861 1.04287 1 0.229004
8.0000 61.6317 61.6317 1.04538 1.04233 1 0.238258
9.0000 66.5083 66.5083 1.04053 1.04051 1 0.292742
10.0000 70.28 70.28 1.03949 1.03942 1 0.362411
11.0000 72.1232 72.1232 1.03925 1.03703 1 0.609387
12.0000 69.9989 69.9989 1.01165 1.01161 1 1.94851
13.0000 55.8429 55.8429 0.85352 0.855203 1 3.45115
14.0000 35.8816 35.8816 0.82848 0.829267 1 3.92361
15.0000 28.5319 28.5319 0.841792 0.840249 1 3.62504
With 16 workers the difference is still small:
0.0000 0 0 1.00105 1.00001 1 0.22741
1.0000 17.8333 17.8368 1.0018 1.00213 1 0.189157
2.0000 30.0013 30.0059 0.998416 1.00031 1 0.173557
3.0000 37.5643 37.5695 0.997988 0.998184 1 0.173343
4.0000 41.8114 41.8166 0.99676 0.997491 1 0.156575
5.0000 43.7248 43.7297 0.999956 0.997943 1 0.146248
6.0000 50.0476 50.0529 0.99702 0.997097 1 0.208966
7.0000 56.0496 56.0549 0.996112 0.994787 1 0.2221
8.0000 61.6079 61.6128 0.99378 0.993314 1 0.226777
9.0000 66.4985 66.5029 0.991396 0.993332 1 0.267916
10.0000 70.293 70.2966 0.992784 0.992875 1 0.392505
11.0000 72.1338 72.1358 0.996828 0.995295 1 0.500862
12.0000 69.9585 69.9552 0.99986 1.00107 1 1.44062
13.0000 55.8058 55.7975 1.01444 1.01611 1 3.96896
14.0000 35.8923 35.8799 1.00968 1.01045 1 3.90625
15.0000 28.5471 28.5297 1.01212 1.0096 1 3.79876
With 96 workers it gets more apparent:
0.0000 0 0 1.01385 1.01472 1 0.21068
1.0000 17.8053 17.8338 1.0161 1.01475 1 0.160629
2.0000 29.9608 30.0007 1.01141 1.01131 1 0.158863
3.0000 37.5237 37.5685 1.00656 1.00616 1 0.166607
4.0000 41.7727 41.8177 1.00315 1.00436 1 0.137509
5.0000 43.6822 43.7241 1.00054 1.00327 1 0.120537
6.0000 49.9962 50.043 1.00397 1.00255 1 0.172506
7.0000 55.9958 56.0442 1.0002 1.00035 1 0.180326
8.0000 61.5557 61.6033 0.997729 0.998324 1 0.205489
9.0000 66.4521 66.4972 0.998525 0.99733 1 0.241285
10.0000 70.2568 70.2967 0.997775 0.995361 1 0.324848
11.0000 72.1146 72.1443 0.991771 0.993465 1 0.423423
12.0000 69.9596 69.962 0.996804 0.996622 1 1.22238
13.0000 55.7931 55.7463 0.982504 0.982494 1 3.28334
14.0000 35.9134 35.8342 0.987271 0.98767 1 3.97839
15.0000 28.6089 28.4962 0.991842 0.991252 1 5.01319
The problem is not solved by updating every sampling step (awh-nsamples-update = 1
).
Edited by Magnus Lundborg