Remove can_accept

The main idea of primitive's providing can_accept was that we could build valid pipelines without having to really run full pipelines. That we would just pass metadata through and this way know if we can combine a set of primitives together to form a valid pipeline. In a way, that would be a generalization of type checking, where we would not be just checking structural types but allow using any values stored in metadata. Even more, instead of this type checking being done by an external tool, every primitive could implement their own logic of how to verify if some data with some metadata would really work.

One of motivations was to be able to figure how to combine primitives forming a neural network. Those primitives in general have to match in dimensions of outputs which regular structure types in Python do not really expose. We would need something like constraints in types to be able to express that for example input is a NxN array and then output is 2Nx2N or something. We haven't implemented support for constraints in metadata (#71 (closed)) so this use case could not really be tested.

But on the other hand through time the approach with can_accept has shown some major issues:

  • Primitive authors have effectively implement primitives twice, once operating on data once on metadata. They can try to reuse code, but this makes primitives very hard to read.
  • It is hard to implement can_accept and is additional work, so almost no primitive really implements this (well). Most authors do not understand a need for it, what it should do, or simply do not have time to implement it.
  • Moreover, can_accept should have the same behavior as calling regular methods (it can fail in more cases, but not less cases). This is tricky to assure and implement correctly. Bad implementation of can_accept could lead pipeline search to not discover pipelines which would be in fact valid.
  • Implementing can_accept even if you really try is often hard. You do not really have data available, so while in theory it is a nice idea that you get metadata as input and return metadata, in practice it is often hard to know what to put into the resulting metadata without having access to data. Often data controls a lot of behavior of the primitive. So metadata objects going through a chain of can_accept methods generally is becoming more and more general and quickly becomes just a structural type, even if primitives try to implement can_accept fully.
  • Primitives close to the beginning of the pipeline have the hardest time implementing can_accept because they generally work on complicated data transformations. And without data it is hard to do that. For example, you might know that there is a foreign key to another table, but without data you do not really know how things get joined, which rows to which other rows.
  • The whole thing works only if all primitives in a chain of primitives implement can_accept. Which makes everything a chicken and egg problem. Because it is not useful until everyone does it, (almost) nobody does it.

So I would suggest we simply remove it. I think there are other easier alternatives. For example, you could simply subsample a small input dataset instead of whole dataset, like 100 rows, and send that through the chain of primitives. You would then be able to see if it works with maybe a bit more resources than just metadata, but primitive could still operate on data and no dual logic would have to be implemented in primitives. can_accept is in a way an extreme where you send through the primitive a subsample with zero rows.