Skip to content

Read data

Simon Will requested to merge Simon-Will/dias-e2e:read_data into master

This branch adds the file kvret.py, which holds the KVRET class to read a set of kvret data json files. It can be used like this:

import kvret

# Read only the dialogues with the navigate intent
data = kvret.KVRET.from_files(path_to_train_f,
    path_to_dev_f, path_to_test_f,
    filter_fn=kvret.dialogue_intent_is_navigate)

# Generate batches of ten examples which contain all the
# previous utterances up to some point in the dialogue.
partial_dialogue_batches = data.train.partial_dialogue_batches(size=10)

The code isn’t perfect, but it should do the job. I wouldn’t be too surprised if there are still a couple bugs in the hiding.

As of now, the batching only works if a filter_fn is provided at loading time to limit the dialogues to all have the same intent.

Merge request reports