... | @@ -70,18 +70,16 @@ Any pre-existing information, other than tokenisation, will be overwritten. Ther |
... | @@ -70,18 +70,16 @@ Any pre-existing information, other than tokenisation, will be overwritten. Ther |
|
|
|
|
|
### Running UDPipe on partly annotated files
|
|
### Running UDPipe on partly annotated files
|
|
|
|
|
|
Suppose your corpus is already tokenized, annotated for MWEs, and partly annotated for morphosyntax (e.g. for UPOS tags) and in the .cupt format
|
|
Suppose your corpus is already tokenized, annotated for MWEs, and annotated for morphology (LEMMA, UPOS and FEATS columns) but not for syntax (HEAD and DEPREL columns). Otherwise, you can use UDPipe in a custom way:
|
|
|
|
|
|
1. Convert your .cupt files into .conllu:
|
|
1. Convert your .cupt files into .conllu (deleting the last column):
|
|
- for every file, say text1.cupt, run the following command from the PARSEME utilities repo, indicating your language code by the `--lang` option:
|
|
- for every file, say input-001.cupt, run the following command from the PARSEME utilities repo, indicating your language code by the `--lang` option:
|
|
`utilities/st-organizers/to_cupt.py --lang HR --keepranges --colnames ID FORM LEMMA UPOS XPOS FEATS HEAD DEPREL DEPS MISC --input text1.cupt > text1.conllu`
|
|
`utilities/st-organizers/to_cupt.py --lang HR --keepranges --colnames ID FORM LEMMA UPOS XPOS FEATS HEAD DEPREL DEPS MISC --input input-001.cupt > input-001.conllu`
|
|
|
|
|
|
|
|
2. Parse the input-001.conllu file [online](https://lindat.mff.cuni.cz/services/udpipe/), selecting "Parse" but not "Tag and Lemmatize" option, DPipe 2 and CoNLL-U input. Save the output locally, e.g. as input-001.udpipe.conllu. (Alternatively, you can use your local version of UDPipe in a custom way, with the `--parse` option (see the [UDPipe documentation](https://ufal.mff.cuni.cz/udpipe/2))).
|
|
|
|
|
|
|
|
3. Merge both files:
|
|
|
|
`utilities/st-organizers/to_cupt.py --lang HR --discard-non-mwes --input input-001.cupt --conllu input-001.udpipe.conllu > input-001.new.cupt`
|
|
|
|
|
|
To
|
|
|
|
|
|
|
|
|
|
|
|
## Consistency checks scripts
|
|
## Consistency checks scripts
|
|
|
|
|
... | | ... | |