... | ... | @@ -86,6 +86,10 @@ Suppose your corpus is already tokenized, annotated for MWEs, and annotated for |
|
|
3. Merge both files:
|
|
|
`utilities/st-organizers/to_cupt.py --lang HR --discard-non-mwes --input input-001.cupt --conllu input-001.udpipe.conllu > input-001.new.cupt`
|
|
|
|
|
|
### Updating morphosyntactic annotations
|
|
|
|
|
|
Suppose your corpus has both morphosyntactic annotations and MWE annotations and you want to update the former. For instance, your morphosyntactic annotations stem from a UD treebank which has been recently updated, or from UDPipe, which has a new better model for your language. In this case, our scripts help you update the morphosyntactic annotations automatically, based on new UD treebanks or UDPipe models. For details, see [this page](https://gitlab.com/parseme/corpora/-/wikis/Updating-morphosyntactic-annotations) - to appear in **[July 2023]**.
|
|
|
|
|
|
## Consistency checks scripts
|
|
|
|
|
|
PARSEME provides scripts to increase the consistency of annotations. Their use is described on the LL's guide to [enhance existing corpora](Enhancing-existing-corpora). They can be found in the [PARSEME utilities](https://gitlab.com/parseme/utilities/) repository. The script is based on lemmas, verifying if annotations concerning the same sets of lemmas use the same labels across the whole corpus. The script can also help spotting skipped expressions. However, only potential problems are found: the corrections still need to be examined and performed manually.
|
... | ... | |