How do we measure and how to even define runtime overhead of a primitive

When primitive is being run, runtime has to do some additional steps. Like preparing hyper-parameters (creating instances of hyper-parameter primitives, or cloning them). This could take significant time. One could claim that knowing the difference in how much time it takes could be a factor to decide which primitive to pick between two otherwise similar primitives.

Currently we do not measure or capture this information in pipeline runs. We measure method calls but this does not take this preparation time into the account.

The question is also what exactly should we measure here as this overhead. What exactly is it? What is part of runtime implementation detail and what is really part of inputs/hyper-parameter preparation? Is this even something which could be translated between runtimes even if we would measure it? Maybe some runtime uses distributed execution system and overhead of cloning primitives is simply higher in that particular case, but on some other runtime it might be something else (and some other primitive) which is worse off.

See discussion in #165 (closed) for some background.