Fix data races when using multi threading
This is a continuation of #2 (comment 699247459). Unfortunately there are many more static global objects that all need thread_local
to make them thread safe. As far as I understand all of these objects are caches, and therefore having them thread-local should be OK. The only static global object that I didn't prefix with thread_local
is static Config _cfg;
in src/Config.cc, which would make configuration thread local, which in my opinion is not desirable.
Merge request reports
Activity
I'm not sure about some of these. They are mostly singletons, primarily for caching purposes but the semantics aren't clearly defined. If they were returned as const then it would be purely a caching thing, but as it is they are editable references and hence any user changes would be expected to apply to all "client" threads.
That's explicitly the case for the LHAGlue PDFSetHandler, which is there to provide the "old" LHAGlue behaviour of a global set of loaded-PDF indices: any legacy multithreaded Fortran code would expect that to be thread-global, because the old interface didn't have any special thread-local treatment. If we were to change that, we'd need to signpost the change very clearly, or e.g. apply it to the new Fortran API but not the legacy one.
I'm not sure what the best thing is to do with the FileIO object, because that was explicitly written to allow memory-sharing on MPI clusters. I suspect that making it thread-local would completely break the MPI single-read behaviour, so that's maybe something to be enabled in the conditional case of being built without MPI support. Not sure if having differences in thread-locality based on that build flag is a good thing. The same might apply to all those cached return-by-reference functions, since if thread-local all the MPI instances would need to recompute... in-memory operations like that are much less of a problem than I/O ones, but these are functions that scan filesystems and read index files. Maybe @max.knobbe knows / has discussed this with Stefan Hoeche (who doesn't have an account here, otherwise I'd link him in)
mentioned in commit eb62d566
We decided to accept this -- thanks again for the MR, @cschwan1 ! It seems perfectly safe for the caches to be thread-specific, especially since their existence was always meant to be invisible to the user.
I am going to put in a little hack to stop it from imposing the thread-locality on the
FileIO
in MPI-enabled builds, in case the two modes collide badly: we can increment further based on feedback from different groups of users.@agbuckley Great!