indirect copy boundary check
I implemented a prototype for boundary check, but not sure if it is the correct approach. I have tested it on single and multi-node executions, and I can verify that we only receive one profiling result per each copy operation.