J

jwcdg

jwcdg is a dependency parser for natural language sentences. It's developed at the University of Hamburg. jwcdg is the reimplementation of cdg, a constraint based dependency parser written in C. It currently comes with a grammar for German.

Name Last Update
resources Loading commit data...
scripts Loading commit data...
src Loading commit data...
.gitignore Loading commit data...
AUTHORS Loading commit data...
LICENSE Loading commit data...
README.md Loading commit data...
backup startup.properties Loading commit data...
default.properties Loading commit data...
incremental.properties Loading commit data...
pom.xml Loading commit data...
startup.properties Loading commit data...
virtual.properties Loading commit data...

Welcome to CDG.

This is the Java Constraint Dependency Grammar Parser available on http://nats-www.informatik.uni-hamburg.de/view/CDG/.

Please be aware that this port/reimplementation of cdg to java may be a bit rough around the edges and doesn't have all of the original functions, such as an interactive command line, yet. However, it comes with two GUIs: DepTreeViewer and AnnoViewer.

In DepTreeViewer, you can type a sentence into the input field. It is parsed incrementally (one increment for each new word). The dependency tree, its score and the constraint violations are displayed for every increment. Furthermore, parses can be saved as cda files and cda files can be loaded. The current dependency tree can be exported as SVG.

AnnoViewer can open folders with cda files for viewing and editing. It facilitates annotating sentences, e.g. sentences can be marked as "done". Additionally, several folders with different annotations of the same sentences can be opened in parallel, so that all the dependency trees for one sentence are displayed at the same time.

Please do send us feedback and suggestions or ask for help if you encounter any problems.

Have fun,
Your CDG Team.

Contact

Email:

cdg@informatik.uni-hamburg.de (You will reach the active project members with this e-mail address)

please consider writing an e-mail to cdg@ before contacting an individual below!

Wolfgang Menzel menzel@informatik.uni-hamburg.de (project leader)
Niels Beuck beuck@informatik.uni-hamburg.de
Christopher Baumgärtner baumgaer@informatik.uni-hamburg.de
Arne Köhn koehn@informatik.uni-hamburg.de
Christine Köhn ckoehn@informatik.uni-hamburg.de

See the AUTHORS file for more information on contributors to CDG.

Copyright

Copyright (C) 1997-2015 The CDG Team cdg@informatik.uni-hamburg.de

jwcdg is free software: you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation, either version 2 of the License, or (at your option) any later version.

Please see the file COPYING for details.

This program is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY, to the extent permitted by law; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.

Installation Requirements

You need maven to compile jwcdg. It will get all the requirements automatically.

If you want to use the German grammar bundled with jwcdg, you need to download the lexicon from

https://nats-www.informatik.uni-hamburg.de/view/CDG/DownloadPage

and unpack it into the resources/ directory.

The following programs are recommended for running CDG:

Compiling From Source

# change to jwcdg directory
cd /path/to/jwcdg/

# clean old files
mvn clean

# run unit tests
mvn test

# compile (instead of test if you don't want to run the tests)
mvn compile

# create an executable jar file containing all dependencies
mvn package

# create your own configuration file
# We recommend to configure one of the taggers above with
# "taggerCommand" (see startup.properties)

cp startup.properties my-startup.properties
emacs my-startup.properties

Running jwcdg

To run jwcdg non-incrementally, use

java -jar target/jwcdg-1.0.jar my-startup.properties

You can now write sentences and get parses.

The tokens have to be separated by spaces. If you have non-alphanumeric characters, you should enclose that token in single quotes.

Example: '"' Viele 'Michael Jackson-Fans' waren traurig '"' , sagte Petra Musterfrau .

to work with an input/output encoding different to your default system encoding: (example for latin-1)

java -Dfile.encoding=ISO-8859-1 -jar target/jwcdg-1.0.jar my-startup.properties

if you want to use incremental parsing, do this:

java -jar target/jwcdg-1.0.jar --incremental /path/to/output-%1.cda my-startup.properties

jwcdg will now read a sentence from stdin and write the results to /path/to/output-[Number of Increment].cda

Running DepTreeViewer

To run DepTreeViewer, use

java -cp target/jwcdg-1.0.jar de.unihamburg.informatik.nats.jwcdg.gui.DepTreeViewer -c my-startup.properties

By default, sentences in the input field are parsed incrementally, which can be turned off in the preferences (Edit → Preferences). Tokens have to be separated by spaces. When parsing incrementally, a space character triggers the parsing of the current increment. So, make sure to end your sentence with a space (even after punctuation marks).

Running AnnoViewer

To run AnnoViewer, use

java -cp target/jwcdg-1.0.jar de.unihamburg.informatik.nats.jwcdg.gui.AnnoViewer -c my-startup.properties /path/to/folder/with/cda/files

You can specify several folders as arguments if you want to view/edit the annotations for the same sentences simultaneously. If you do so, make sure that the cda files have the same names in each folder.

Documentation

jwcdg doesn't have an API documentation right now. If you want to include jwcdg into your program, have a look at JWCDG.java, where you can see how one interacts with the different bits of jwcdg (it's really easy!)

Online documentation of CDG is available at http://nats-www.informatik.uni-hamburg.de/view/CDG/CdgManuals.

Please visit our website to have a look at the publications related to CDG at http://nats-www.informatik.uni-hamburg.de/view/CDG/ProjectPublications.