mwetoolkit

mwetoolkit

The complete framework for multiword expressions processing. http://mwetoolkit.sourceforge.net/

Name Last Update
aux-docs Loading commit data...
bin Loading commit data...
docs Loading commit data...
gui Loading commit data...
include Loading commit data...
meta Loading commit data...
src/indexer Loading commit data...
test Loading commit data...
toy Loading commit data...
.gitattributes Loading commit data...
.gitignore Loading commit data...
CHANGELOG.md Loading commit data...
COPYING Loading commit data...
Doxyfile Loading commit data...
LICENSE Loading commit data...
Makefile Loading commit data...
README.md Loading commit data...

mwetoolkit

Multiword Expressions toolkit

The mwetoolkit aids in the automatic identification and extraction of multiword expressions (MWEs) from running text. These include idioms (kick the bucket), noun compounds (cable car), phrasal verbs (take off, give up), etc.

Even though it focuses on multiword expresisons, the framework is quite complete and can be useful in any corpus-based study in computational linguistics. The mwetoolkit can be applied to virtually any text collection, language, and MWE type. It is a command-line tool written mostly in Python. Its development started in 2010 as a PhD thesis but the project keeps active (see commit logs).

Up-to-date documentation and details about the tool can be found at the mwetoolkit website: http://mwetoolkit.sourceforge.net/

1) INSTALLING

Please refer to the website for up-to-date installation instructions.

2) QUICK START

To install the mwetoolkit, just download it from the GIT repository using the following command:

git clone --depth=1 "https://gitlab.com/mwetoolkit/mwetoolkit.git"

As the code evolves fast, we recommend you to use the GIT version instead of old releases. Periodically git pull to have access to latest improvements.

Once you have downloaded the toolkit, navigate to the main folder and run the command below for compiling the C libraries used by the toolkit.1

make

3) EXAMPLES

The toy folder contains a set of files for performing a toy experiment. You can try to run the whole pipeline by calling

./run-tutorial.sh

Specific documentation about the examples is in the script itself, as comments.

4) REGRESSION TESTS

The test folder contains regression tests for most scripts. In order to test your installation of the mwetoolkit, navigate to this folder and then call the script testAll.sh

cd test
./testAll.sh

Should one of the tests fail2, please send a copy of the output and a brief description of your configurations (operating system, version, machine) to our email.


  1. If you do not run this command, the toolkit will still work but it will use a Python version (much slower and possibly obsolete!) of the indexing and counting scripts. This may be OK for small corpora. 

  2. Please, beware that on Mac OS some test will appear to fail when they actually succeed, the only differences being in rounding less significant digits of float numbers.