Split into train/dev/test

Currently, the splitting script divides the input dataset into two parts: train and test.

It should divide the input dataset to three parts: train, dev, and test.

Additionally, we should make sure that the dev set has similar properties to the test set in the sense that it also contains a non-negligible number of unseen MWEs w.r.t. train.

NOTE: we stick to the definition that a MWE is unseen if the same bag of lemmas does not occur in train.