AWH bias and PMF output can get out of sync with multiple workers

When using AWH to steer FEP the PMF should follow the bias completely. With multiple workers this is not the case, more noticable the more workers. The error is probably related to the communication rather than the AWH FEP implementation as such, but exposed since it would be very difficult to notice when using AWH with pulling, since the PMF and bias are usually different in that case. Example AWH output (PMF in second column and bias in third) running one worker:

    0.0000  0  0  1.02802  1.02828  1  0.249836
    1.0000  17.8363  17.8363  1.03562  1.03238  1  0.183354
    2.0000  30.0102  30.0102  1.03875  1.03816  1  0.156207
    3.0000  37.5822  37.5822  1.0348  1.03981  1  0.178732
    4.0000  41.8373  41.8373  1.04195  1.04078  1  0.159754
    5.0000  43.7604  43.7604  1.04466  1.04053  1  0.170852
    6.0000  50.0838  50.0838  1.03752  1.04158  1  0.221154
    7.0000  56.0813  56.0813  1.03861  1.04287  1  0.229004
    8.0000  61.6317  61.6317  1.04538  1.04233  1  0.238258
    9.0000  66.5083  66.5083  1.04053  1.04051  1  0.292742
   10.0000  70.28  70.28  1.03949  1.03942  1  0.362411
   11.0000  72.1232  72.1232  1.03925  1.03703  1  0.609387
   12.0000  69.9989  69.9989  1.01165  1.01161  1  1.94851
   13.0000  55.8429  55.8429  0.85352  0.855203  1  3.45115
   14.0000  35.8816  35.8816  0.82848  0.829267  1  3.92361
   15.0000  28.5319  28.5319  0.841792  0.840249  1  3.62504

With 16 workers the difference is still small:

    0.0000  0  0  1.00105  1.00001  1  0.22741
    1.0000  17.8333  17.8368  1.0018  1.00213  1  0.189157
    2.0000  30.0013  30.0059  0.998416  1.00031  1  0.173557
    3.0000  37.5643  37.5695  0.997988  0.998184  1  0.173343
    4.0000  41.8114  41.8166  0.99676  0.997491  1  0.156575
    5.0000  43.7248  43.7297  0.999956  0.997943  1  0.146248
    6.0000  50.0476  50.0529  0.99702  0.997097  1  0.208966
    7.0000  56.0496  56.0549  0.996112  0.994787  1  0.2221
    8.0000  61.6079  61.6128  0.99378  0.993314  1  0.226777
    9.0000  66.4985  66.5029  0.991396  0.993332  1  0.267916
   10.0000  70.293  70.2966  0.992784  0.992875  1  0.392505
   11.0000  72.1338  72.1358  0.996828  0.995295  1  0.500862
   12.0000  69.9585  69.9552  0.99986  1.00107  1  1.44062
   13.0000  55.8058  55.7975  1.01444  1.01611  1  3.96896
   14.0000  35.8923  35.8799  1.00968  1.01045  1  3.90625
   15.0000  28.5471  28.5297  1.01212  1.0096  1  3.79876

With 96 workers it gets more apparent:

    0.0000  0  0  1.01385  1.01472  1  0.21068
    1.0000  17.8053  17.8338  1.0161  1.01475  1  0.160629
    2.0000  29.9608  30.0007  1.01141  1.01131  1  0.158863
    3.0000  37.5237  37.5685  1.00656  1.00616  1  0.166607
    4.0000  41.7727  41.8177  1.00315  1.00436  1  0.137509
    5.0000  43.6822  43.7241  1.00054  1.00327  1  0.120537
    6.0000  49.9962  50.043  1.00397  1.00255  1  0.172506
    7.0000  55.9958  56.0442  1.0002  1.00035  1  0.180326
    8.0000  61.5557  61.6033  0.997729  0.998324  1  0.205489
    9.0000  66.4521  66.4972  0.998525  0.99733  1  0.241285
   10.0000  70.2568  70.2967  0.997775  0.995361  1  0.324848
   11.0000  72.1146  72.1443  0.991771  0.993465  1  0.423423
   12.0000  69.9596  69.962  0.996804  0.996622  1  1.22238
   13.0000  55.7931  55.7463  0.982504  0.982494  1  3.28334
   14.0000  35.9134  35.8342  0.987271  0.98767  1  3.97839
   15.0000  28.6089  28.4962  0.991842  0.991252  1  5.01319

The problem is not solved by updating every sampling step (awh-nsamples-update = 1).

Edited Nov 27, 2020 by Magnus Lundborg