Rework the GradientCompositionalityMixin

It is not yet clear what all methods we need in GradientCompositionalityMixin. It seems we would need forward and backward backprop methods to really make gradient-based end-to-end training to work. But it is unclear for what we need other methods and it seems also that they can be expressed in terms of forward and backward` methods.

backward is being added as part of !23 (merged).