README.md 9.04 KB
Newer Older
1
2
3
4
Welcome to CDG.
===============

This is the Java Constraint Dependency Grammar Parser available on
5
<http://nats-www.informatik.uni-hamburg.de/view/CDG/>.
6
7
8

Please be aware that this port/reimplementation of cdg to java may be
a bit rough around the edges and doesn't have all of the original
9
functions, such as an interactive command line, yet. However, it comes
10
11
with two GUIs in addition to the parser itself: DepTreeViewer and
AnnoViewer.
12

13
14
15
16
17
18
19
In [DepTreeViewer](#deptreeviewer), you can type a sentence into the
input field. It is parsed incrementally (one increment for each new
word). If you wish, you can configure the parser to predict upcoming
words. The dependency tree, its score and the constraint violations
are displayed for every increment. Furthermore, parses can be saved as
cda files and cda files can be loaded. The current dependency tree can
be exported as SVG.
20

21
22
23
24
25
26
27
28
29
30
![DeptreeViewer parses sentences typed by the user incrementally](img/deptreeviewer.png)

[AnnoViewer](#annoviewer) can open folders with cda files for viewing
and editing. It facilitates annotating sentences, e.g. sentences can
be marked as "done". Additionally, several folders with different
annotations of the same sentences can be opened in parallel, so that
all the dependency trees for one sentence are displayed at the same
time.

![AnnoViewer displays cda files for viewing and editing](img/annoviewer.png)
31
32
33

Please do send us feedback and suggestions or ask for help if you
encounter any problems.
34

Christine Köhn's avatar
Christine Köhn committed
35
Have fun,  
36
    Your CDG Team.
37
38


Arne Köhn's avatar
Arne Köhn committed
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
Citing
======

If you use jwcdg in your research, please cite Beuck et al. (2013),
which introduces this implementation:

    @InCollection{beuck13:PredIncr,
    author ={Niels Beuck and Arne Köhn and Wolfgang Menzel},
    editor ={Kim Gerdes and Eva Hajičová and Leo Wanner},
    booktitle ={Computational Dependency Theory},
    title ={Predictive Incremental Parsing and its Evaluation},
    publisher ={IOS press},
    year =2013,
    url = {http://dx.doi.org/10.3233/978-1-61499-352-0-186},
    volume =258,
    series ={Frontiers in Artificial Intelligence and Applications},
    pages ={186 - 206}}


58
59
60
61
62
63
64
65
66
67
68
Contact
=======

Email:

cdg@informatik.uni-hamburg.de (You will reach the active project
                               members with this e-mail address)

please consider writing an e-mail to cdg@ before contacting an
individual below!

69
70
71
Wolfgang Menzel	<menzel@informatik.uni-hamburg.de> (project leader)  
Niels Beuck <beuck@informatik.uni-hamburg.de>  
Arne Köhn <koehn@informatik.uni-hamburg.de>  
72
Christine Köhn <ckoehn@informatik.uni-hamburg.de>
73
74
75
76
77
78
79

See the AUTHORS file for more information on contributors to CDG.


Copyright
=========

80
Copyright (C) 1997-2015 The CDG Team <cdg@informatik.uni-hamburg.de>
81

82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
jwcdg is free software: you can redistribute it and/or modify it under
    the terms of the GNU General Public License as published by the
    Free Software Foundation, either version 2 of the License, or (at
    your option) any later version.

Please see the file COPYING for details.

This program is distributed in the hope that it will be useful, but
WITHOUT ANY WARRANTY, to the extent permitted by law; without even the
implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.


Installation Requirements
=========================

You need maven to compile jwcdg. It will get all the requirements automatically.

99
If you want to use the German grammar bundled with jwcdg, you need to
100
101
download or create the lexicon and the correspondig hierarchies. You
can download them from
102

103
<https://nats-www.informatik.uni-hamburg.de/view/CDG/DownloadPage>
104

105
106
107
108
109
110
111
112
113
114
and unpack it into the resources/ directory. If you want to create the
files yourself, you need Perl:

    cd /path/to/jwcdg/
	scripts/make-lexicon.pl
	mv scripts/deutsch-lexikon.cdg resources/
	mv scripts/deutsch-hierarchien.cdg resources/
	
The lexicon and hierarchies files depend on each other. Make sure that
you don't mix download and generated files.
115

116
117
The following programs are recommended for running CDG:

118
 - A POS tagger with weighted multitagging capabilities:
119

120
     hunpos (free software)  
Christine Köhn's avatar
Christine Köhn committed
121
       <https://gitlab.com/akoehn/hunpos>
122

123
124
     or

125
126
     TnT (nonfree software but better accuracy for multitagging)  
       <http://heartofgold.dfki.de/pkg/components-tnt.tar.gz>
127
128
129
130


Compiling From Source
=====================
131
132
    # change to jwcdg directory
    cd /path/to/jwcdg/
133

134
135
    # clean old files
    mvn clean
136

137
138
    # run unit tests
    mvn test
139

140
    # compile (instead of test if you don't want to run the tests)
141
    mvn compile
142

143
    # create an executable jar file containing all dependencies
144
    mvn package
145

146
    # create your own configuration file
147
148
149
    # We recommend to configure one of the taggers above with
    # "taggerCommand" (see startup.properties)

150
151
    cp startup.properties my-startup.properties
    emacs my-startup.properties
152

153
154
155
156
Configuring the Parser
======================

If you follow the installation requirements and set up your
Christine Köhn's avatar
Christine Köhn committed
157
`my-startup.properties` as described above, you are ready to run the
158
159
160
161
162
163
164
165
166
167
168
169
parser (jwcdg itself) or one of the GUIs. Have a look at
[default.properties](default.properties) to see all of the options and
their default value.


<a name="prediction"></a>Predict Upcoming Words during Parsing
--------------------------------------------------------------

Set this in your properties file (e.g. `my-startup.properties`)
    
	useVirtualNodes   = true

170
171
172

Running jwcdg
=============
173

174
175
To run jwcdg non-incrementally, use

176
    java -jar target/jwcdg-1.0.jar my-startup.properties
177
178
179

You can now write sentences and get parses.

180
181
182
183
184
185
186
187
The tokens have to be separated by spaces. If you have
non-alphanumeric characters, you should enclose that token in single
quotes.

Example: '"' Viele 'Michael Jackson-Fans' waren traurig '"' , sagte Petra Musterfrau .

to work with an input/output encoding different to your default system encoding: (example for latin-1)

188
    java -Dfile.encoding=ISO-8859-1 -jar target/jwcdg-1.0.jar my-startup.properties
189
190
191

if you want to use incremental parsing, do this:

192
    java -jar target/jwcdg-1.0.jar --incremental /path/to/output-%1.cda my-startup.properties
193
194
195

jwcdg will now read a sentence from stdin and write the results to /path/to/output-[Number of Increment].cda

196
197
<a name="batches"></a> Processing Batches of Sentences
======================================================
198
199
200
201
202

See the BatchProcessor on how to do this. It provides several input formats:

    java -cp target/jwcdg-1.0.jar de.unihamburg.informatik.nats.jwcdg.BatchProcessor -help

203

204
205
<a name="deptreeviewer"></a>Running DepTreeViewer
=================================================
206
207

To run DepTreeViewer, use
208

Christine Köhn's avatar
Christine Köhn committed
209
    java -cp target/jwcdg-1.0.jar de.unihamburg.informatik.nats.jwcdg.gui.DepTreeViewer -c my-startup.properties
210
211

By default, sentences in the input field are parsed incrementally,
212
which can be turned off in the preferences (Edit →
213
214
215
216
217
Preferences). Tokens have to be separated by spaces. When parsing
incrementally, a space character triggers the parsing of the current
increment. So, make sure to end your sentence with a space (even after
punctuation marks).

218
219
220
221
To open a cda file in DepTreeViewer, use File → Open menu or pass it as an argument:

    java -cp target/jwcdg-1.0.jar de.unihamburg.informatik.nats.jwcdg.gui.DepTreeViewer -c my-startup.properties /path/to/file.cda

222

223
224
225
226
227
228
229
230
231
<a name="annoviewer"></a>Running AnnoViewer
===========================================

You can view and edit dependency annotations for multiple
sentences. The sentences need to be in cda format. cda is the native
output of jwcdg. One way to obtain cda files for your sentences is to
parse them with BatchProcessor (see [Processing batches of
sentences](#batches)). If you need conll, you can [convert cda files
to conll files](#convertToConll) when you have finished annotating.
232

Christine Köhn's avatar
Christine Köhn committed
233
To run AnnoViewer, use
234

235
236
237
238
239
240
    java -cp target/jwcdg-1.0.jar de.unihamburg.informatik.nats.jwcdg.gui.AnnoViewer -c my-startup.properties /path/to/folder/with/cda/files

You can specify several folders as arguments if you want to view/edit
the annotations for the same sentences simultaneously. If you do so,
make sure that the cda files have the same names in each folder.

241
242
243
244
245
246
247
248
249
250
251
252
253
AnnoViewer's performance will decrease with the number of
sentences. We recommend to load up to approximately 100 sentences per
folder. If you want to annotate more sentences, split them into
several folders and either load them subsequently or in different
instances of AnnoViewer.


<a name="convertToConll"></a>Need Your Annotations in conll? Convert cda to conll
=================================================================================

This script converts cda files to
https://gitlab.com/nats/toolbox/blob/master/convert-cda2conll.py

254

255
256
257
258
259
260
261
262
263
Documentation
=============

jwcdg doesn't have an API documentation right now. If you want to
include jwcdg into your program, have a look at JWCDG.java, where you
can see how one interacts with the different bits of jwcdg (it's
really easy!)

Online documentation of CDG is available at
264
<http://nats-www.informatik.uni-hamburg.de/view/CDG/CdgManuals>.
265
266

Please visit our website to have a look at the publications related to CDG at
267
<http://nats-www.informatik.uni-hamburg.de/view/CDG/ProjectPublications>.
268