... | ... | @@ -23,8 +23,9 @@ The annotation of corpora, in most languages, uses the central PARSEME annotatio |
|
|
|
|
|
## File format conversion
|
|
|
|
|
|
* [PARSEME utilities](https://gitlab.com/parseme/utilities/): a repository containing useful scripts for corpus management, including parsemetsv<->CUPT conversion, adjudication, consistency checks, and corpus statistics. LLs may need to run some of these scripts with the help of core organizers.
|
|
|
|
|
|
PARSEME provides scripts to convert between CUPT, CoNLL-U and FoliA. They can be found in the [PARSEME utilities](https://gitlab.com/parseme/utilities/) repository, in the folder `st-organizers`. The most important scripts are:
|
|
|
* `to_cupt.py`: this script converts a file given as input into CUPT. The input can be in FoLiA, CUPT, CoNLL-UP (of which CoNLL-U is an instance) or PARSEME-TSV (deprecated) formats. The script automatically detects the input format. It is also possible to use this script to align the input with a corresponding CoNLL-U file (e.g. a newer UD treebank version). If a CoNLL-U file is provided with the `--conllu` option, the information in the CoNLL-U will be prioritary, except for the MWE annotations present only in the `--input` file. The script is capable of correcting incompatible tokenization, considering that the CoNLL-U file is correct. Other options control the presence of `NonVMWE` annotations, MWE annotations on multiword tokens (forbidden in CUPT) and the names of columns in CoNLL-UP (if not present in the header, as it is the case for standard CoNLL-U).
|
|
|
* `to_folia.py`: similarly to above, this script converts anything into FoLiA. This script is already integrated into FLAT so it should not be necessary to run it manually.
|
|
|
|
|
|
## Morphosyntactic annotations: UDPipe
|
|
|
|
... | ... | @@ -34,13 +35,12 @@ PARSEME provides scripts to increase the consistency of annotations. Their use i |
|
|
|
|
|
## Error mining: Grew-match
|
|
|
|
|
|
* [Grew-match](http://match.grew.fr/?corpus=PARSEME-EN): an online query tool on annotated data. The guide for [enhancing-existing-corpora] describes how to use Grew-match to mine errors in the annotations.
|
|
|
* [Grew-match](http://match.grew.fr/?corpus=PARSEME-EN): an online query tool on annotated data. The guide for [enhancing existing corpora](Enhancing-existing-corpora) describes how to use Grew-match to mine errors in the annotations.
|
|
|
|
|
|
## Gitlab data repositories
|
|
|
|
|
|
* [PARSEME utilities](https://gitlab.com/parseme/utilities/): a repository containing useful scripts for corpus management, including parsemetsv<->CUPT conversion, adjudication, consistency checks, and corpus statistics. LLs may need to run some of these scripts with the help of core organizers.
|
|
|
* [Development Gitlab space](https://gitlab.com/parseme/sharedtask-data-dev) (for authorised users): contains development versions of the corpora, double-aligned corpora for IAA calculation, system results from previous editions, various scripts for ST organizers (automating system evaluation, publishing the results, running IAA). In 2020, we gradually move the development version of language corpora to dedicated gitlab repositories, keeping in this repository only organisation data (preliminary results, IAA data, internal scripts)
|
|
|
* [PARSEME guidelines](https://gitlab.com/parseme/sharedtask-guidelines): a repository hosting the HTML guidelines and issues page (LLs generally do not need to edit the guidelines directly but they do participate in raising and solving issues).
|
|
|
* [Description of PARSEME repositories](https://docs.google.com/document/d/1Wkx7bWTR04TXFVypPKy-qYi4ugc_034BtfskDeLDoGU/). This document may require updates and its content should be slowly moved here. Please send us a message if you find any inconsistency.
|
|
|
|
|
|
## Guidelines editions and example editing
|
|
|
|
|
|
* [PARSEME guidelines](https://gitlab.com/parseme/sharedtask-guidelines): a repository hosting the HTML guidelines and issues page (LLs generally do not need to edit the guidelines directly but they do participate in raising and solving issues). |