Allowing multiple output methods
So one struggle with current interface is that a primitive can effectively be returning only one type of output. This makes things simple and easy to compose in the pipeline, but has some drawbacks:
- some primitives do have multiple types of outputs, like
predictandpredict_log_probain sklearn world - clustering primitives naturally have also multiple types of outputs, like distance matrix or centers
- it is inefficient because while we can have multiple primitives each computing a different type of output, sharing internally same code, the computation itself will not be shared between primitives so we might have to run the same computation on same data multiple times, just to get different outputs
In discussion about clustering primitives we decided to go with intermediary result object which would effectively cache this computation. So one primitive would compute necessary information about clusters and return that as an object, and then a family of other primitives would take this as input and return various representation of this information.
But if we look at predict and predict_log_proba from sklearn world as two types of output for the same primitive, it seems like most primitives would need such intermediary representation in this approach. Which is ridiculous.
So I am proposing an alternative. A primitive can have multiple types of produce methods, but they are clearly marked as such (we can reuse @output decorator from #27 (closed)). The semantics is the same, you still first have to fit the primitive, but then you can decide which one of the produce methods to call. They should behave the same, just return different variations of outputs.
Alternatively, we could do this as an argument to produce, but then typing becomes harder. It is harder to explain what will be the output type for a particular produce type. Multiple methods makes this cleaner.