Commit 98c3dbcc authored by Charles Vernerey's avatar Charles Vernerey
Browse files

Merge branch 'dev' into 'master'

Dev

See merge request chaver/data-mining!1
parents 1db9a0e1 a564b17b
Loading
Loading
Loading
Loading
+3 −0
Original line number Diff line number Diff line
@@ -170,3 +170,6 @@ $RECYCLE.BIN/
*.lnk

.archive
paper/media/
paper/paper.jats
paper/paper.pdf
 No newline at end of file
+0 −0

File moved.

Makefile

0 → 100644
+2 −0
Original line number Diff line number Diff line
install:
	mvn clean install -Dgpg.skip=true
 No newline at end of file
+10 −7
Original line number Diff line number Diff line
# Data-mining
# Choco-mining

This repository contains the source code that was used in the experiments of the following paper : *Vernerey et al. - Threshold-free Pattern Mining Meets Multi-Objective Optimization: Application to Association Rules* ([IJCAI 2022](https://www.ijcai.org/proceedings/2022/0261)). Supplementary material is available in the `paper` folder.
Choco-mining is a Java library for solving itemset mining problems that is based on [Choco-solver](https://github.com/chocoteam/choco-solver). This repository contains the source code that was used in the experiments of the following paper : *Vernerey et al. - Threshold-free Pattern Mining Meets Multi-Objective Optimization: Application to Association Rules* ([IJCAI 2022](https://www.ijcai.org/proceedings/2022/0261)). Supplementary material is available in the `paper` folder.

## Requirements

@@ -12,16 +12,16 @@ This repository contains the source code that was used in the experiments of the
If you have Maven installed in your computer, you can simply build the project with the following command :

```bash
mvn clean package
make install
```

If you are interested by using some constraints in your own project, you can add a new maven dependency :
If you are interested by using some constraints in your own project, you can add a new maven dependency in the file `pom.xml` of your project :

```xml
<dependency>
    <groupId>io.gitlab.chaver</groupId>
    <artifactId>data-mining</artifactId>
    <version>1.0.1</version>
    <version>1.0.2</version>
</dependency>
```

@@ -31,10 +31,13 @@ The following constraints are available :
- **CoverClosure** : ensures that a pattern `x` is closed w.r.t. `{freq}` (see *Schaus et al. - CoverSize : A Global Constraint for Frequency-Based Itemset Mining*)
- **CoverSize** : given an integer variable `f` and pattern `x`, ensures that `f = freq(x)` (see *Schaus et al. - CoverSize : A Global Constraint for Frequency-Based Itemset Mining*)
- **Generator** : ensures that a pattern `x` is a generator (see *Belaid et al. - Constraint Programming for Association Rules*)
- **FrequentSubs**: ensures that a pattern `x` has all its subsets frequent (see *Belaid et al. - Constraint Programming for Mining Borders of Frequent Itemsets*)
- **InfrequentSupers**: ensures that a pattern `x` has all its supersets infrequent (see *Belaid et al. - Constraint Programming for Mining Borders of Frequent Itemsets*)
- **Overlap**: a constraint inspired by ClosedDiversity(see *Hien et al. - A Relaxation-based Approach for Mining Diverse Closed Patterns*) that ensures that a pattern $x$ is diverse w.r.t. history of patterns (i.e. there exists no pattern `y` in the history such that `jaccard(x,y) > j`, where `j` is a diversity threshold specified by the user)

Note that a `jar` file with all the required dependencies is available [here](https://s01.oss.sonatype.org/service/local/artifact/maven/redirect?r=releases&g=io.gitlab.chaver&a=data-mining&v=1.0.1&e=jar&c=jar-with-dependencies) if you really don't want to use Maven.
Detailed examples on how to use each constraint for solving different mining tasks are available [here](https://gitlab.com/chaver/data-mining/-/wikis/home).

## Usage
## Command-Line Usage

You can run the jar file using the script `run` at the root of the project. 

paper/app.dot

0 → 100644
+49 −0
Original line number Diff line number Diff line
digraph G {
    rankdir="LR";
  choco[label="Choco solver"];
  mining[label="Choco mining"];
  mining -> choco;
  csize[label="CoverSize",fillcolor="#F8CECC",style="filled"];
  cclosure[label="CoverClosure",fillcolor="#F8CECC",style="filled"];
  aclosure[label="AdequateClosure",fillcolor="#F8CECC",style="filled"];
  fsubs[label="FrequentSubs",fillcolor="#F8CECC",style="filled"];
  isupers[label="InfrequentSupers",fillcolor="#F8CECC",style="filled"];
  generator[label="Generator",fillcolor="#F8CECC",style="filled"];
  cdiv[label="ClosedDiversity",fillcolor="#F8CECC",style="filled"];
  pareto[label="Pareto",fillcolor="#F8CECC",style="filled"];
  pareto -> choco;
  cclosure -> mining;
  aclosure -> mining;
  csize -> mining;
  fsubs -> mining;
  isupers -> mining;
  generator -> mining;
  cdiv -> mining;
  fim[label="Frequent Itemset\n Mining",fillcolor="#DAE8FC",style="filled"];
  fim -> csize;
  closedp[label="Closed Itemset\n Mining",fillcolor="#DAE8FC",style="filled"];
  closedp -> csize;
  closedp -> cclosure;
  skyp[label="Skypattern\n Mining",fillcolor="#DAE8FC",style="filled"];
  skyp -> csize;
  skyp -> aclosure;
  skyp -> pareto;
  maxfim[label="Maximal Frequent\n Itemset Mining",fillcolor="#DAE8FC",style="filled"];
  maxfim -> csize;
  maxfim -> isupers;
  maxfim -> fsubs;
  minfim[label="Mnimal Infrequent\n Itemset Mining",fillcolor="#DAE8FC",style="filled"];
  minfim -> csize;
  minfim -> isupers;
  minfim -> fsubs;
  genm[label="Generator\n Mining",fillcolor="#DAE8FC",style="filled"];
  genm -> csize;
  genm -> generator;
  assm[label="Association Rule\n Mining",fillcolor="#DAE8FC",style="filled"];
  assm -> csize;
  assm -> cclosure;
  assm -> generator;
  divm[label="Diverse Itemset\n Mining",fillcolor="#DAE8FC",style="filled"]
  divm -> cdiv;
  divm -> csize;
}
 No newline at end of file
Loading