P

poseval

scripts to run Part-of-Speech evaluations

Name Last Update
graphs Loading commit data...
taggers Loading commit data...
tfiles Loading commit data...
tmp Loading commit data...
utils Loading commit data...
LICENSE Loading commit data...
README Loading commit data...
negra-sentence-selector.pl Loading commit data...
train-evaluate.conf.sample Loading commit data...
train-evaluate.pl Loading commit data...
Requirements
============

For the evaluation of tagging results you need SVMTool. You can get it
from [1].

Setup and general Information
=============================

If you want to use the train-evaluate UI, you have to copy
train-evaluate.conf.sample to train-evaluate.conf and adjust
everything to your needs.

The subdirectory "taggers" contains taggers that use different taggers
such as TnT. Every tagger contains the configuration of the interfaced
tagger. This makes the interface quite simple and you don't have to
fiddle around with configuration files each time you want to test
something. If you want to test a tagger with a different
configuration, just create a new tagger. Have a look at the existing
interfaces and taggers/README for more information.

You should add every tagger you want to evaluate to the list of
taggers in the configuration file.

All information will be stored in a sqlite3 database: Which tagger has
been trained on which testset, the results of the tests, infomation
about the testsets.

Usage
=====

To start the program, type `perl train-evaluate.pl`

Creating new Testsets
---------------------

Just use "tsets->create tset" in the menu to create a tset.

A testset consists of three files: train, test and gold.
 - train contains the sentences used to train a tagger.
 - test is the file used to test the tagger. It contains all the
   sentences that are not in train.
 - gold contains the same sentences as test and the correct tag for
   each word.

You will be asked how many sentences you want for training. A new
testset will be created, using randomly chosen sentences for
training. All sentences that are not in the training set go to the
testset.

Training and testing taggers
----------------------------

mark every tagger you want to train and every tset you want those
taggers to be trained on and press "train". train-evaluate will
automatically sort out those combinations that have already be trained
and distribute the remaining jobs to the hosts specified in the
configuration file.

You can then use the same process to test the taggers on tsets.

IMPORTANT: svmt-standard has to be trained on each tset you want to
use for testing. The training data created by svmt-standard s used
for the evaluation.

Getting nice graphs
-------------------

Since nobody really wants to poke around in the sqlite database to get
the results, train-evaluate can show you some nice graphs. Just mark
every tagger you want to be graphed and go to "taggers->make graph".

You will be asked for an output directory. This is the directory in
which you can find all the data to plot the resulting graph by
hand. If you have an X server running, train-evaluate will start
gnuplot for you.

Questions, Comments etc.
========================

If you have questions, send me an e-mail: arne at arne-koehn.de

This software is licensed under the GPLv3 or later.

You should be able to get it from http://gitorious.org/poseval


Footnotes: 
[1]  http://www.lsi.upc.edu/~nlp/SVMTool/