Parsing Free-Form Language Learner Data: Current State and Error Analysis
This repository contains material that was used or produced for the following paper:
Christine Köhn, Tobias Staron and Arne Köhn. 2016. Parsing Free-Form Language Learner Data: Current State and Error Analysis. In Proceedings of the 13th Conference on Natural Language Processing (KONVENS). pages 135-145, Bochum, Germany, September. http://nbn-resolving.de/urn:nbn:de:gbv:18-228-7-2269
Additionally, links to already published software utilized by us can be found below.
For material which is too big to upload or for any other references or material that is missing here, please contact us directly (see below).
Currently only the gold standard annotations for Falko-100dep are contained in this repository. More is coming soon!
The sentences were randomly sampled from the FalkoEssayL2 corpus v2.4 downloaded from http://www.linguistik.hu-berlin.de/de/institut/professuren/korpuslinguistik/forschung/falko/zugang.
- 100dep_gold: Gold standard annotations (gold labeled dependencies and gold PoS tags) for 100 sentences from the FalkoEssayL2 corpus
License for text: Creative Commons Attribution 3.0 Unported (CC BY 3.0)
License for annotation: Creative Commons Attribution 4.0 International (CC BY 4.0)
File names for sentences
ID_ESSAYID_SENTENCE.cda or sentence_ID.conll:
ID encodes the the proficiency level of the writer in terms of the Common European Framework of References for Languages (CEFR) level:
ID 0001 → level B2,
ID 0002 → level C1,
ID 0003 → level C2,
ID 0004 → level B2,...
ESSAYID corresponds to original Excel file name (e.g. kne03_2006_06_L2v2.4 -> kne03_2006_06_L2v2.4.xls), which itself encodes meta information about the essay (v2.4 for version, for the others see Reznicek et. al. 2012).
SENTENCE is the sentence number according to the manually annotated sentence boundaries on the ZH0 level (see Reznicek et. al. 2012)
https://gitlab.com/nats/jwcdg - jwcdg parser (includes links to other required components)
https://github.com/taolei87/RBGParser - RBGParser
https://www.cs.cmu.edu/~ark/TurboParser/ - TurboParser (including TurboTagger)
Data on inquiry
The trained models are quite big. Thus, they will not be uploaded. If you are interested in them, contact us (see below).
If you are interested in more information about the hybrid approach, contact Tobias Staron directly.
Christine Köhn - ckoehn at informatik.uni-hamburg.de
Tobias Staron - staron at informatik.uni-hamburg.de
Arne Köhn - koehn at informatik.uni-hamburg.de