All metrics which should operate on multi-label should be made so that they operate on our multi-label predictions

See here which metrics are generally thought to support multi-label: https://datadrivendiscovery.org/wiki/display/work/Matrix+of+metrics

But our multi-label format uses PrimaryMultiKey and has multiple rows (for each label) per d3mIndex value. This should be converted to one-hot encoding expected by sklearn implementations.