Some of the classes that I use for common data mining tasks.

Data mining tools

This package inludes the common tools that I use for data mining tasks. It mainly revolves around clustering and clustering ensemble methods. Since it grew through the years, it is not very cohesive.

As a known issue, most of the tests do not follow proper testing principles. If I have time in the future I may fix them. In the meantime, any pull requests are welcome.


The project uses maven for configuration. Eclipse project files can be created by running the following command:

mvn eclipse:eclipse

For installation of metis package, please see below.

External code

Some of the classes are adapted from different sources:


Ensemble algorithms from Strehl and Ghosh require the metis graph partitioning algorithm from Karypis Labs. It can be downloaded from this link. It is written in C, therefore you have to build it separately.

All of the metis operations are encapsulated by tk.memin.dm.cluster.ensemble.strehl.MetisOperations. It looks for the executables under a folder named metis-4.0. If you install metis elsewhere, change the constant in the class.