Commit 0979807d authored by Charles Vernerey's avatar Charles Vernerey
Browse files

Refactor code + update README + add factory for creating Constraints

parent 407397e2
Loading
Loading
Loading
Loading
+15 −15
Original line number Diff line number Diff line
@@ -49,24 +49,24 @@ for (Pattern closed : closedPatterns) {

The following constraints are available in Choco-Mining:

- $CoverSize_{D}(x,f)$ [SchausAG17]: Given an integer variable $f$ that represents the frequency (noted $freq$) of an itemset $x$, the constraint ensures that $f = freq(x)$.
- $CoverClosure_{D}(x)$ [SchausAG17]: The constraint ensures that $x$ is closed w.r.t. the frequency, i.e. $\nexists y \supset x: freq(x) = freq(y)$.
- $AdequateClosure_{D,M}(x)$ [VernereyLAL22]: Given a set of measures $M$, the constraint ensures that $x$ is closed w.r.t. $M$, i.e. $\nexists y \supset x$ such that for all $m \in M : m(x) = m(y)$.
- $FrequentSubs_{D,s}(x)$ [Belaid2BL19]: Given a frequency threshold $s$, the constraint ensures that all the subsets of $x$ are frequent, i.e. $\forall y \subset x : freq(y) \ge s$.
- $InfrequentSupers_{D,s}(x)$ [Belaid2BL19]: Given a frequency threshold $s$, the constraint ensures that all the supersets of $x$ are infrequent, i.e. $\forall y \supset x : freq(y) \lt s$.
- $Generator_{D}(x)$ [BelaidBL19]: The constraint ensures that $x$ is a generator, i.e. $\nexists y \subset x : freq(y) = freq(x)$.
- $ClosedDiversity_{D,\mathcal{H},j,s}(x)$ [HienLALLOZ20]: Given a history of itemsets $\mathcal{H}$, a diversity threshold $j$ and a minimum frequency threshold $s$, the constraint ensures that $x$ is a diverse itemset (i.e. $\nexists y \in \mathcal{H} : jaccard(x,y) \ge j$), $x$ is closed w.r.t. the frequency and $freq(x) \ge s$.
- `CoverSize(x,f)` [SchausAG17]: Given an integer variable `f` that represents the frequency (noted `freq`) of an itemset `x`, the constraint ensures that `f = freq(x)`.
- `CoverClosure(x)` [SchausAG17]: The constraint ensures that `x` is closed w.r.t. the frequency, i.e. there exists no superset `y` of `x` such that `freq(x) = freq(y)`.
- `AdequateClosure(M,x)` [VernereyLAL22]: Given a set of measures `M`, the constraint ensures that `x` is closed w.r.t. `M`, i.e. there exists no superset `y` of `x` such that for each measure `m` in `M`, we have `m(x) = m(y)`.
- `FrequentSubs(s,x)` [Belaid2BL19]: Given a frequency threshold `s`, the constraint ensures that each subset `y` of `x` is frequent, i.e. `freq(y) >= s`.
- `InfrequentSupers(s,x)` [Belaid2BL19]: Given a frequency threshold `s`, the constraint ensures that each superset `y` of `x` is infrequent, i.e. `freq(y) <= s`.
- `Generator(x)` [BelaidBL19]: The constraint ensures that `x` is a generator, i.e. there exists no subset `y` of `x` such that `freq(y) = freq(x)`.
- `ClosedDiversity(H,j,s,x)` [HienLALLOZ20]: Given a history of itemsets `H`, a diversity threshold `j` and a minimum frequency threshold `s`, the constraint ensures that `x` is a diverse itemset (i.e. there exists no itemset `y` in `H` such that  `jaccard(x,y) >= j`), `x` is closed w.r.t. the frequency and `freq(x) >= s`.

We can model different problems using these constraints. The above figure shows examples of mining tasks (in blue) with the constraints (in red) involved in their modelling:

- Frequent Itemset Mining: Given a threshold $s$, find all the itemsets $x$ such that $freq(x) \ge s$.
- Closed Itemset Mining: Given a threshold $s$, find all the itemsets $x$ such that $freq(x) \ge s$ and $\nexists y \supset x : freq(x) = freq(y)$.
- Skypattern Mining: Given a set of measures $M$, find all the itemsets $x$ such that there exists no other itemset $y$ that dominates $x$. We say that $y$ dominates $x$ iff $\forall m \in M : m(y) \ge m(x)$ and $\exists m \in M : m(y) \gt m(x)$.
- Maximal Frequent Itemset Mining: Given a threshold $s$, find all the itemsets $x$ such that $freq(x) \ge s$ and $\forall y \supset x : freq(y) \lt s$.
- Minimal Infrequent Itemset Mining: Given a threshold $s$, find all the itemsets $x$ such that $freq(x) \lt s$ and $\forall y \subset x : freq(y) \ge s$.
- Generator Mining: Find all the itemsets $x$ such that $\nexists y \subset x : freq(y) = freq(x)$.
- Association Rule Mining: Find all the association rules $x \Rightarrow y$ that respect the constraints specified by the user.
- Diverse Itemset Mining: Given a diversity threshold $j$ and a minimum frequency threshold $s$, find all the diverse itemsets that are closed w.r.t. the frequency and such that $freq(x) \ge s$.
- Frequent Itemset Mining: Given a threshold `s`, find all the itemsets `x` such that `freq(x) >= s`.
- Closed Itemset Mining: Given a threshold `s`, find all the itemsets `x` such that `freq(x) >= s` and that are closed w.r.t. the frequency.
- Skypattern Mining: Given a set of measures `M`, find all the itemsets `x` such that there exists no other itemset `y` that dominates `x`. We say that `y` dominates `x` iff for each measure `m` in `M` we have `m(y) >= m(y)` and there exists at least one measure `m` in `M` such that `m(y) > m(x)`.
- Maximal Frequent Itemset Mining: Given a threshold `s`, find all the itemsets `x` such that `freq(x) >= s` and for each superset `y` of `x` we have `freq(y) < s`.
- Minimal Infrequent Itemset Mining: Given a threshold `s`, find all the itemsets `x` such that `freq(x) < s` and for each subset `y` of `x` we have `freq(y) >= s`.
- Generator Mining: Find all the itemsets `x` that are generators.
- Association Rule Mining: Find all the association rules `x => y` that respect the constraints specified by the user.
- Diverse Itemset Mining: Given a diversity threshold `j` and a minimum frequency threshold `s`, find all the diverse itemsets that are closed w.r.t. the frequency and such that `freq(x) >= s`.

## Installation

+7 −0
Original line number Diff line number Diff line
@@ -155,6 +155,13 @@
                <groupId>org.apache.maven.plugins</groupId>
                <artifactId>maven-javadoc-plugin</artifactId>
                <version>3.3.2</version>
                <configuration>
                    <!-- Other Javadoc configurations -->
                    <links>
                        <link>https://javadoc.io/doc/org.choco-solver/choco-solver/4.10.13/org.chocosolver.solver</link>
                        <!-- Add other external Javadoc URLs here -->
                    </links>
                </configuration>
                <executions>
                    <execution>
                        <id>attach-javadocs</id>
+5 −5
Original line number Diff line number Diff line
@@ -9,13 +9,12 @@
 */
package io.gitlab.chaver.mining.examples;

import io.gitlab.chaver.mining.patterns.constraints.CoverClosure;
import io.gitlab.chaver.mining.patterns.constraints.CoverSize;

import io.gitlab.chaver.mining.patterns.constraints.factory.ConstraintFactory;
import io.gitlab.chaver.mining.patterns.io.DatReader;
import io.gitlab.chaver.mining.patterns.io.TransactionalDatabase;
import io.gitlab.chaver.mining.patterns.io.Pattern;
import org.chocosolver.solver.Model;
import org.chocosolver.solver.constraints.Constraint;
import org.chocosolver.solver.variables.BoolVar;
import org.chocosolver.solver.variables.IntVar;

@@ -24,6 +23,7 @@ import java.util.LinkedList;
import java.util.List;
import java.util.stream.IntStream;


/**
 * Example of closed pattern mining (a closed pattern is an itemset which has no superset with the same frequency)
 */
@@ -43,9 +43,9 @@ public class ExampleClosedItemsetMining {
        // Ensures that length = sum(x)
        model.sum(x, "=", length).post();
        // Ensures that freq = frequency(x)
        model.post(new Constraint("Cover Size", new CoverSize(database, freq, x)));
        ConstraintFactory.coverSize(database, freq, x).post();
        // Ensures that x is a closed itemset
        model.post(new Constraint("Cover Closure", new CoverClosure(database, x)));
        ConstraintFactory.coverClosure(database, x).post();
        // Create a list to store all the closed itemsets
        List<Pattern> closedPatterns = new LinkedList<>();
        while (model.getSolver().solve()) {
+3 −5
Original line number Diff line number Diff line
@@ -9,14 +9,12 @@
 */
package io.gitlab.chaver.mining.examples;

import io.gitlab.chaver.mining.patterns.constraints.AdequateClosureDC;
import io.gitlab.chaver.mining.patterns.constraints.CoverSize;
import io.gitlab.chaver.mining.patterns.constraints.factory.ConstraintFactory;
import io.gitlab.chaver.mining.patterns.io.DatReader;
import io.gitlab.chaver.mining.patterns.io.TransactionalDatabase;
import io.gitlab.chaver.mining.patterns.io.Pattern;
import io.gitlab.chaver.mining.patterns.measure.Measure;
import org.chocosolver.solver.Model;
import org.chocosolver.solver.constraints.Constraint;
import org.chocosolver.solver.variables.BoolVar;
import org.chocosolver.solver.variables.IntVar;

@@ -54,12 +52,12 @@ public class ExampleClosedItemsetMining2 {
        IntVar maxFreq = model.intVar(maxFreq().getId(), 0, database.getNbTransactions());
        // Compute max value of itemFreqVar
        model.max(maxFreq, itemFreqVar).post();
        model.post(new Constraint("Cover Size", new CoverSize(database, freq, x)));
        ConstraintFactory.coverSize(database, freq, x).post();
        // The constraint AdequateClosure ensures that x is closed w.r.t. M
        // Two versions are available : Domain Consistency (DC) and Weak Consistency (WC)
        // Note that the WC version is more time efficient than the DC one
        List<Measure> measures = Arrays.asList(freq(), maxFreq());
        model.post(new Constraint("Adequate Closure", new AdequateClosureDC(database, measures, x)));
        ConstraintFactory.adequateClosure(database, measures, x, true).post();
        List<Pattern> closedPatterns = new LinkedList<>();
        while (model.getSolver().solve()) {
            int[] itemset = IntStream.range(0, x.length)
+3 −4
Original line number Diff line number Diff line
@@ -9,9 +9,8 @@
 */
package io.gitlab.chaver.mining.examples;

import io.gitlab.chaver.mining.patterns.constraints.CoverClosure;
import io.gitlab.chaver.mining.patterns.constraints.CoverSize;
import io.gitlab.chaver.mining.patterns.constraints.Overlap;
import io.gitlab.chaver.mining.patterns.constraints.factory.ConstraintFactory;
import io.gitlab.chaver.mining.patterns.io.DatReader;
import io.gitlab.chaver.mining.patterns.io.TransactionalDatabase;
import org.chocosolver.solver.Model;
@@ -40,8 +39,8 @@ public class ExampleDiversity {
        IntVar length = model.intVar("length", 1, database.getNbItems());
        BoolVar[] x = model.boolVarArray("x", database.getNbItems());
        model.sum(x, "=", length).post();
        model.post(new Constraint("Cover Size", new CoverSize(database, freq, x)));
        model.post(new Constraint("Cover Closure", new CoverClosure(database, x)));
        ConstraintFactory.coverSize(database, freq, x).post();
        ConstraintFactory.coverClosure(database, x).post();
        double jmax = 0.05;
        // Overlap is a global constraint that ensures that x is a diverse itemset
        // i.e. there exists no y such that jaccard(x,y) > jmax
Loading