... | ... | @@ -20,6 +20,7 @@ The annotation of corpora, in most languages, uses the central PARSEME annotatio |
|
|
* **CUPT**: Most files in PARSEME use the [CUPT format](http://multiword.sourceforge.net/cupt-format/) (short for **C**oNLL-**U** **P**arseme-**T**SV). CUPT is the PARSEME version/instance of extended [CoNLL-U format](https://universaldependencies.org/format.html), which has been defined jointly with [Universal Dependencies](http://universaldependencies.org/). The generic meta-format extending CoNLL-U is called [CoNLL-U Plus](https://universaldependencies.org/ext-format.html).
|
|
|
* **CoNLL-U**: the [CoNLL-U format](https://universaldependencies.org/format.html) is used in the [Universal Dependencies](http://universaldependencies.org/) project to represent and release morphological and syntactic annotations (i.e. treebanks) for many languages. PARSEME often relies on UD annotations, both manual (in treebanks) and automatic (output of tools like [UDPipe](#morphosyntactic-annotations-udpipe)). Our [conversion scripts](#file-format-conversion) can deal with CoNLL-U and perform integration of MWE annotations with UD-style morphosyntactic annotations.
|
|
|
* **FoLiA**: files in FLAT are manipulated using a generic XML format called [FoLiA](https://proycon.github.io/folia/). We provide tools to [convert](#file-format-conversion) from FoLiA to CUPT and vice-versa, as well as integration with UD's CoNLL-U format.
|
|
|
* **parseme-tsv**: the [parseme-tsv format](https://typo.uni-konstanz.de/parseme/index.php/2-general/184-parseme-shared-task-format-of-the-final-annotation) is a _deprecated_ specification which might still pop up in some older documents. It was used for the first shared task, and consists of a sort of extracted CUPT's 1st, 2nd, 10th (SpaceAfter=No) and 11th columns.
|
|
|
|
|
|
## File format conversion
|
|
|
|
... | ... | |