... | ... | @@ -64,16 +64,19 @@ Since the corpus is large, you may want to parallelise this process by running s |
|
|
|
|
|
### Running UDPipe on MWE-annotated files
|
|
|
|
|
|
Suppose your corpus is already annotated for MWEs and in the .cupt format but misses (some parts of) morphosyntactic annotation. To enhance it with UDPipe, you need to:
|
|
|
Suppose your corpus is already tokenized and annotated for MWEs and in the .cupt (or .folia) format but misses morphosyntactic annotation. To enhance it with UDPipe, proceed as above, this time passing your .cupt files to the parser:
|
|
|
`utilities/lang-leaders/pre-annot/run_udpipe.sh MODELPATH input-001.cupt input-002.cupt ...`
|
|
|
Any existing pre-information, other than tokenisation, will be overwritten. Therefor, if you already have part of the morphosyntactic annotation (e.g. UPOS tags) which you want to keep, you should your local run UDPipe in a customized version (see below).
|
|
|
|
|
|
### Running UDPipe on partly annotated files
|
|
|
|
|
|
Suppose your corpus is already tokenized, annotated for MWEs, and partly annotated for morphosyntax (e.g. for UPOS tags) and in the .cupt format
|
|
|
|
|
|
1. Convert your .cupt files into .conllu:
|
|
|
- for every file, say text1.cupt, run the following command from the PARSEME utilities repo, indicating your language code by the `--lang` option:
|
|
|
|
|
|
- for every file, say text1.cupt, run the following command from the PARSEME utilities repo, indicating your language code by the `--lang` option:
|
|
|
`utilities/st-organizers/to_cupt.py --lang HR --keepranges --colnames ID FORM LEMMA UPOS XPOS FEATS HEAD DEPREL DEPS MISC --input text1.cupt > text1.conllu`
|
|
|
|
|
|
2. Annotate the .conllu file with UDPipe
|
|
|
- If the file is not large, you can [parse it online](https://lindat.mff.cuni.cz/services/udpipe/), selecting your language model, UDPipe 2 and CoNLL-U input.
|
|
|
- Otherwise, you can run UDPipe locally as above. However,
|
|
|
|
|
|
|
|
|
|
|
|
|
... | ... | |