Add clustering transformer base class

Currently ClusteringPrimitiveBase assumes that you compute clustering information using fit and that produce just map new samples into computed clusters.

But this is not always the case or possible. Sometimes you cannot map new samples based on existing clusters. For that, a better approach seems to be to have also a clustering transformer, which simply computes clustering from a whole set of given input samples.

Tricky thing here, from implementation perspective, is how to prevent that calling both produce and produce_distance_matrix does not have to compute things twice. This is left to the implementation to resolve and from the API point of view is just an optimization: semantics and results of calling both should be the same with caching or without caching. Having caching also introduces state to the transformer primitives, which is against how they are otherwise understood, but, hah, that's reality. But this does mean that implementation has to be extra careful about this state, even if it is just for caching: when cached data is based on some random source, the implementation should probably add both cached data and random state (not random seed) to pickling information (but not params) so that a primitive can be restored and will operate exactly the same as if it would not be pickled/unpickled. If implementation is deterministic, then caching probably does not have to be pickled because it can be simply recomputed and it is just a performance hit.

Edited by Mitar