... | ... | @@ -18,7 +18,7 @@ The annotation of corpora, in most languages, uses the central PARSEME annotatio |
|
|
## File format documentation
|
|
|
|
|
|
* **CUPT**: Most files in PARSEME use the [CUPT format](http://multiword.sourceforge.net/cupt-format/) (short for **C**oNll-**U** **P**arseme-**T**SV). CUPT is the PARSEME version/instance of extended [CoNLL-U format](https://universaldependencies.org/format.html), which has been defined jointly with [Universal Dependencies](http://universaldependencies.org/). The generic meta-format extending CoNLL-U is called [CoNLL-U Plus](https://universaldependencies.org/ext-format.html).
|
|
|
* **CoNLL-U**: the [CoNLL-U format](https://universaldependencies.org/format.html) is used in the [Universal Dependencies](http://universaldependencies.org/) project to represent and release morphological and syntactic annotations (i.e. treebanks) for many languages. PARSEME often relies on UD annotations, both manual (in treebanks) and automatic (output of tools like [UDPipe](#Morphosyntactic-annotations:-UDPipe)). Our [conversion scripts](#File-format-conversion) can deal with CoNLL-U and perform integration of MWE annotations with UD annotations.
|
|
|
* **CoNLL-U**: the [CoNLL-U format](https://universaldependencies.org/format.html) is used in the [Universal Dependencies](http://universaldependencies.org/) project to represent and release morphological and syntactic annotations (i.e. treebanks) for many languages. PARSEME often relies on UD annotations, both manual (in treebanks) and automatic (output of tools like [UDPipe](##morphosyntactic-annotations-udpipe)). Our [conversion scripts](#file-format-conversion) can deal with CoNLL-U and perform integration of MWE annotations with UD annotations.
|
|
|
* **FoLiA**: files in FLAT are manipulated using a generic XML format called [FoLiA](https://proycon.github.io/folia/). We provide tools to convert from FoLiA to CUPT and vice-versa below, as well as integration with UD's CoNLL-U format.
|
|
|
|
|
|
## File format conversion
|
... | ... | @@ -34,7 +34,7 @@ PARSEME provides scripts to increase the consistency of annotations. Their use i |
|
|
|
|
|
## Error mining: Grew-match
|
|
|
|
|
|
* [Grew-match](http://match.grew.fr/?corpus=PARSEME-EN): an online query tool on annotated data.
|
|
|
* [Grew-match](http://match.grew.fr/?corpus=PARSEME-EN): an online query tool on annotated data. The guide for [enhancing-existing-corpora] describes how to use Grew-match to mine errors in the annotations.
|
|
|
|
|
|
## Gitlab data repositories
|
|
|
|
... | ... | @@ -43,4 +43,4 @@ PARSEME provides scripts to increase the consistency of annotations. Their use i |
|
|
|
|
|
## Guidelines editions and example editing
|
|
|
|
|
|
* [PARSEME guidelines](https://gitlab.com/parseme/sharedtask-guidelines): a repository hosting the HTML guidelines and issues page (LLs generally do not need to edit the guidelines directly but they do participate in raising and solving issues) |
|
|
* [PARSEME guidelines](https://gitlab.com/parseme/sharedtask-guidelines): a repository hosting the HTML guidelines and issues page (LLs generally do not need to edit the guidelines directly but they do participate in raising and solving issues). |