... | ... | @@ -15,12 +15,18 @@ The annotation of corpora, in most languages, uses the central PARSEME annotatio |
|
|
* [FLAT annotation platform](http://mwe.phil.hhu.de/): the PARSEME instance of [FLAT](https://github.com/proycon/flat), developed by Maarten van Gompel and hosted at the University of Düsseldorf.
|
|
|
* [FLAT user guide](https://docs.google.com/document/d/1zd_VhXQTel_IRVQ_u6s2wvJttwBHdDIk5YtWDMa3QW4/edit#) for PARSEME annotation
|
|
|
|
|
|
## File formats and conversions: utilities
|
|
|
## File format documentation
|
|
|
|
|
|
* **CUPT**: Most files in PARSEME use the [CUPT format](http://multiword.sourceforge.net/cupt-format/) (short for **C**oNll-**U** **P**arseme-**T**SV). CUPT is the PARSEME version/instance of extended [CoNLL-U format](https://universaldependencies.org/format.html), which has been defined jointly with [Universal Dependencies](http://universaldependencies.org/). The generic meta-format extending CoNLL-U is called [CoNLL-U Plus](https://universaldependencies.org/ext-format.html).
|
|
|
* **CoNLL-U**: the [CoNLL-U format](https://universaldependencies.org/format.html) is used in the [Universal Dependencies](http://universaldependencies.org/) project to represent and release morphological and syntactic annotations (i.e. treebanks) for many languages. PARSEME often relies on UD annotations, both manual (in treebanks) and automatic (output of tools like [UDPipe](#Morphosyntactic-annotations:-UDPipe)). Our [conversion scripts](#File-format-conversion) can deal with CoNLL-U and perform integration of MWE annotations with UD annotations.
|
|
|
* **FoLiA**: files in FLAT are manipulated using a generic XML format called [FoLiA](https://proycon.github.io/folia/). We provide tools to convert from FoLiA to CUPT and vice-versa below, as well as integration with UD's CoNLL-U format.
|
|
|
|
|
|
## File format conversion
|
|
|
|
|
|
* [PARSEME utilities](https://gitlab.com/parseme/utilities/): a repository containing useful scripts for corpus management, including parsemetsv<->CUPT conversion, adjudication, consistency checks, and corpus statistics. LLs may need to run some of these scripts with the help of core organizers.
|
|
|
* [CUPT format](http://multiword.sourceforge.net/cupt-format/): Description of the PARSEME version of extended [CoNLL-U format](https://universaldependencies.org/format.html), defined jointly with [Universal Dependencies](http://universaldependencies.org/). The generic meta-format extending CoNLL-U is called [CoNLL-U Plus](https://universaldependencies.org/ext-format.html).
|
|
|
|
|
|
## Morphosyntactic annotations with UDPipe
|
|
|
|
|
|
## Morphosyntactic annotations: UDPipe
|
|
|
|
|
|
## Consistency checks scripts
|
|
|
|
... | ... | |