Skip to content
Snippets Groups Projects

Fix data races when using multi threading

Merged Christopher Schwan requested to merge cschwan1/lhapdf:fix-data-races into main

This is a continuation of #2 (comment 699247459). Unfortunately there are many more static global objects that all need thread_local to make them thread safe. As far as I understand all of these objects are caches, and therefore having them thread-local should be OK. The only static global object that I didn't prefix with thread_local is static Config _cfg; in src/Config.cc, which would make configuration thread local, which in my opinion is not desirable.

Merge request reports

Loading
Loading

Activity

Filter activity
  • Approvals
  • Assignees & reviewers
  • Comments (from bots)
  • Comments (from users)
  • Commits & branches
  • Edits
  • Labels
  • Lock status
  • Mentions
  • Merge request status
  • Tracking
  • I'm not sure about some of these. They are mostly singletons, primarily for caching purposes but the semantics aren't clearly defined. If they were returned as const then it would be purely a caching thing, but as it is they are editable references and hence any user changes would be expected to apply to all "client" threads.

    That's explicitly the case for the LHAGlue PDFSetHandler, which is there to provide the "old" LHAGlue behaviour of a global set of loaded-PDF indices: any legacy multithreaded Fortran code would expect that to be thread-global, because the old interface didn't have any special thread-local treatment. If we were to change that, we'd need to signpost the change very clearly, or e.g. apply it to the new Fortran API but not the legacy one.

    I'm not sure what the best thing is to do with the FileIO object, because that was explicitly written to allow memory-sharing on MPI clusters. I suspect that making it thread-local would completely break the MPI single-read behaviour, so that's maybe something to be enabled in the conditional case of being built without MPI support. Not sure if having differences in thread-locality based on that build flag is a good thing. The same might apply to all those cached return-by-reference functions, since if thread-local all the MPI instances would need to recompute... in-memory operations like that are much less of a problem than I/O ones, but these are functions that scan filesystems and read index files. Maybe @max.knobbe knows / has discussed this with Stefan Hoeche (who doesn't have an account here, otherwise I'd link him in)

  • I see, so the problem that I've seen with Config is also present with other singletons. In any case, read/write accesses to these objects have to be fixed and that'll probably require encapsulating each access with locks.

  • merged

  • Andy Buckley mentioned in commit eb62d566

    mentioned in commit eb62d566

  • We decided to accept this -- thanks again for the MR, @cschwan1 ! It seems perfectly safe for the caches to be thread-specific, especially since their existence was always meant to be invisible to the user.

    I am going to put in a little hack to stop it from imposing the thread-locality on the FileIO in MPI-enabled builds, in case the two modes collide badly: we can increment further based on feedback from different groups of users.

  • @agbuckley Great!

Please register or sign in to reply
Loading