... | ... | @@ -34,7 +34,7 @@ PARSEME provides scripts to convert between CUPT, CoNLL-U and FoLiA (and also th |
|
|
If your corpus does not have manual morphological and morphosyntactic annotations, you can/should generate them using an automatic UD-compatible parser such as UDPipe. We provide a script and some instructions below to make this process easier. The input files can be FoLiA, CUPT, CoNLL-U, parseme-tsv, or raw text (UTF-8, LF line endings, one sentence per line).
|
|
|
|
|
|
1. Download the UDPipe **model** for your language:
|
|
|
- Models are described on [UDPipe pretrained model](https://ufal.mff.cuni.cz/udpipe/models). One model is available by corpus so you may have more than one models for your language. You can compare the scores of these models on the UDPipe page.
|
|
|
- Pretrained models are described on the [UDPipe models page](https://ufal.mff.cuni.cz/udpipe/models). One model per corpus is available so you may have more than one model to choose from for your language. You can compare the scores of these models on the UDPipe models page.
|
|
|
- On [Clarin page](https://lindat.mff.cuni.cz/repository/xmlui/handle/11234/1-2998), you can download models trained on version 2.4 of UD (latest at the time of writing, use more recent versions if available).
|
|
|
2. Download the PARSEME utilities repository:
|
|
|
- `git clone git@gitlab.com:parseme/utilities.git` (if not already done) or `git pull` (to get latest files)
|
... | ... | |