Data mining tools
This package inludes the common tools that I use for data mining tasks. It mainly revolves around clustering and clustering ensemble methods. Since it grew through the years, it is not very cohesive.
As a known issue, most of the tests do not follow proper testing principles. If I have time in the future I may fix them. In the meantime, any pull requests are welcome.
The project uses maven for configuration. Eclipse project files can be created by running the following command:
For installation of
metis package, please see below.
Some of the classes are adapted from different sources:
tk.memin.dm.cluster.evaluator.ECSEvaluatoris from M. Yagci original paper
tk.memin.dm.text.PorterStemmeris from Martin Porter
Ensemble algorithms from Strehl and Ghosh require the metis graph partitioning algorithm from Karypis Labs. It can be downloaded from this link. It is written in C, therefore you have to build it separately.
All of the metis operations are encapsulated by
tk.memin.dm.cluster.ensemble.strehl.MetisOperations. It looks for the executables under a folder named
metis-4.0. If you install metis elsewhere, change the constant in the class.