Add basic instrumentation support for GPU tracing libraries
Add a simple instrumentation API built around the wallcycle regions with implementations using NVIDIA NVTX, AMD ROCTX and Intel ITT libraries.
Tracing support can be enabled at build-time and allows the wallcycle regions to show up in tracing tools which greatly aids performance analysis.
Implements #4446 (closed)