Add scoring runtime command line call

Which takes predictions as input and scores them. We have scoring already but it operate directly on the pipeline. We should also allow scoring given predictions themselves.