Number of minimal volume enclosing ellipsoids (MVEE)
Status
Currently MVEE are determined by k-means clustering. Number of clusters k
is determined in nclus
function, by running k
from min(dim, floor(n / 2*(dim+1)))
to 1
until each cluster contains at least 2*(dim+1)
points, where n
is the number of points (one cluster and warning on failure).
Issues
Minimization of the rejection volume
Consider, for all clusterizations satisfying above criteria, and an additional criteria of a minimal gap between ellipsoids boundaries and the actual points (or, possibly related, minimal overlap between ellipsoids). This would vastly improve the rejection sampling from the viable volume in high dimensions as done by intelintg2
(used by Volint
); cf. a 2-D ball example, where with a gap criteria there could be a single, perfect ball cluster:
Black points are the initial sample. Here, they were produced uniformly form the enclosing square (TODO test when produced by: OEAMC
+MEBS
).
It is unclear whether this would improve MEBS
sampling, based on idea of growing the ellipsoids.
Also note that principle axes of the resulting ellipsoids could be more meaningful, and thus much more useful in the parameter space analysis; cf. the toy example from TopoFilter:
Lower bound
For large enough samples, in practice the number of clusters will be always equal to dim
. This conservative approach avoids the issue with a lot of small clusters, for which MVEE may cut out significant volumes of the true viable, causing errors; cf. the toy example from TopoFilter - note exemplary cutouts on the right side, where viable space is rectangular:
There seem to be no justification for dim
upper bound. For instance, consider viable space with three distant, unconnected regions in dim=2
.
Discussion
- Connected regions could be detected by MVEE overlap and treated separately (TODO: separate issue).
- Doesn't it make more sense to put multiple random seed repeats for k-means directly in the
clusterize
procedure, as opposed to inMEBS
(cf. #11, in particular the Adaptive Sample Size comment)? In the first toy example above, if the initial point for lower cluster would be in the lower left corner, one would get the expected "optimal" clusterization wrt MVEE. - Double check that
MEBS
does not already merge ellipsoids (first check: no, it doesn't), and if this could actually improve anything in MEBS. - Merging and splitting clusters based on MVEE seems the way to go (both operations at once if at the upper bound of number of cluster).
- Computing rejected volume is not trivial. It could be approximated, for instance, by finding a convex hull for points, or by calling
homosamp
and counting dropped samples (more extra computation time).