Expose or provide scoring splits
cc @aeciosan @castelo @raonilourenco @roquelopez
Our TA3 is interested in both the score and the predictions/trained model. Right now, it has to call ScoreSolution
to get the score via holdout (on a TA2 train/test split), then makes its own split to call FitSolution
and ProduceSolution
. This is inefficient because:
- TA3 has to do a split which really should be TA2's responsibility
- TA3 has to write those to disk (plasma could help here, but it is poorly supported so far)
- TA2 has to train/test multiple times for what should be a single train-test run
- The analysis conducted by TA3 (for example confusion matrix) is not on the same trained model or predictions that the score was computed from (and that might be very apparent, depending on luck)
This came about after our TA2 and TA3 teams met and considered the full data pipeline our combined system is using, which is getting out of hand:
I am not sure how to address this. Some ideas follow:
- Expose the splits that were used in holdout/cross-validation from
ScoreSolution
- Accept splits provided by TA3 in
ScoreSolution
- Expose predictions and trained model from
ScoreSolution
(we at NYU are interested in producing predictions on made-up data in addition to the test split)