Replace `compute_globals` with ObservablesReducer

compute_globals serves a lot of different purposes of communicating flags and reducing energy-like quantities in mdrun. It carries a lot of dependencies and mis-locates a lot of logic, so should be replaced some time. This issue records thoughts Mark and Pascal evolved some years ago, so they don't get lost. Current thoughts are that the name should reflect what it does for most of its client modules, thus ObservablesReducer.

The current implementation fills an array that can have different contents at each step according to whether energies are computed, various algorithms are active, etc. That uses a lot of branches and index-recording logic in order to minimize communication and reduction size. This does not optimize for the things that are important in modern processors and particularly networks, where the message headers have bigger volume than the largest buffer we ever reduce with compute_globals. It also removes coordination logic from the modules to a central location. There needs to be a central manager because we only wish to communicate up to one time per MD step, but it is not desirable that that manager understands how each potential participant is involved.

I suggest we have an ObservablesReducer object whose responsibility is merely to do the communication and coordination of a buffer. The maximum buffer size needed is known before we enter the MD loop, so we can allocate that and notify modules which locations they may use. So long as the buffer stays under about 1K then we will break about even from removing a lot of branching logic while communicating and reducing a lot of buffer locations that are not needed on this step. (If needed, we could also have a buffer for nstcalcenergy steps and another for other times.)

Choosing the buffer size needs coordination before the MD loop starts. My suggestion is to create a (separate) builder object early in mdrunner, so that modules that need to reduce things register their general interest with it. The provide a maximum required size in doubles, and a callback. At an appropriate time, the builder computes the required size, creates ObservablesReducer with it, and calls back the modules to let them know what view of memory they can rely on henceforth.

Thereafter, at each MD step, each module can choose to fill that view with data they care about, notify ObservablesReducer that they want reduction this step, and optionally provide a callback they want run afterwards. ObservablesReducer reduces on an MD step only when some module wants it. It does so using that fixed-size buffer, expecting that often many elements will be ignored (which in practice we may have to zero regularly, lest they accumulate to infinity in between uses). Then it calls the appropriate callbacks. This means that the modules are in control of their own runtime logic.

It also means that neither ObservablesReducer nor its builder know anything about the modules that they coordinate, since they are the ones being called and need know nothing about the callbacks they call. The callbacks need a definite function signature, but do not have to be members of a class that implements an interface (which is a style that I think @cblau has been suggesting). That approach could also work for so that the modules themselves avoid dependencies on ObservablesReducer or its builder, since they can be given a lambda to call that contains the binding of the relevant method call to the relevant object.

Naturally, suggestions to improve that or do something different are welcome.

@ptmerz and I had discussed improving things here some years ago (and even presented them in Gottingen one time like at #3421 (closed)), but we've not yet suggested acting on it. There is a rather hacky example based on LINCS RMSD in branch https://gitlab.com/gromacs/gromacs/-/commits/improve-compute-globals.

@artemzhmurov alleviated one of the dependency problems in !998 (merged). This proposal does so in a general way.

@ejjordan tried to simplify an aspect of global_stat (called by compute_globals) which reminded me to put this up for posterity.

Edited Apr 26, 2021 by Mark Abraham