hpcprof: Warn about disordered traces at most once
Sometimes the input traces are highly disordered. This is an error on the measurement side, somewhere, but is often tricky to fix and definitely isn't something users can be expected to resolve. This means spitting out a huge message every time isn't appropriate.
Instead, print a slightly shorter message at most once (per rank), that
focuses on the user-visible effect instead of the internal state. The
affected threads are listed in the INFO channel (-vv or higher).
To Demonstrate
(The below is what should happen, given code inspection/intention, since I don't have a highly disordered trace on hand to test for real. Reviewers should test for themselves.)
$ hpcprof very-disordered-trace.m/
WARNING: One or more traces are extremely disordered; sorting activities in affected traces to compensate.
See INFO messages (-vv or higher) to list affected traces.
$ hpcprof very-disordered-trace.m/ -vv
WARNING: One or more traces are extremely disordered; sorting activities in affected traces to compensate.
See INFO messages (-vv or higher) to list affected traces.
INFO: Trace containing significant disorder: NODE(BOTH){716544, 0} RANK(SINGLE){0} GPUCONTEXT(SINGLE){0} GPUSTREAM(SINGLE){0}
INFO: Trace containing significant disorder: NODE(BOTH){716544, 0} RANK(SINGLE){3} GPUCONTEXT(SINGLE){0} GPUSTREAM(SINGLE){0}
INFO: Trace containing significant disorder: NODE(BOTH){716544, 0} RANK(SINGLE){1} GPUCONTEXT(SINGLE){0} GPUSTREAM(SINGLE){0}
INFO: Trace containing significant disorder: NODE(BOTH){716544, 0} RANK(SINGLE){2} GPUCONTEXT(SINGLE){0} GPUSTREAM(SINGLE){0}
$
Additional Information
The new WARNING does not mention potential OOM scenarios because there are ways we could improve this fallback to have less of a memory impact (hpcprof can better handle extremely disordered ... (#913)).
The new WARNING does not reference the manual (yet) as there is not yet an entry the troubleshooting guide explaining this warning. The reference can be added later with minimal effort.