different number of instructions in different clock rate
Hi. I ran ResNet benchmark in two different gpu cycles. I realized that some kernels execute different number of instructions in two gpu cycles. As it is shown in the picture, the left side and the right side text files are for 384 and 768 Gbps, respectively. In particular, these piece of text files show the information of the kernel _Z21executeFirstLayerCUDAPFS_S_S_iiiiiii in two different bandwidth.
I assume the number of instructions should be the same in two bandwidths. But apparently, it is not. I was wondering if you have any clue about this issue?
Thanks Ben