k-Means Algorithm. More...

#include <rgroupingkmeans.h>

Collaboration diagram for RGroupingKMeans< cGroup, cObj, cGroups >:
[legend]

Classes

class  Centroid
 Centroid of a group.
 
class  Group
 Prototype Group.
 

Public Types

enum  tInitial { Incremental, Refine, kMeansPlusPlus, Random }
 

Public Member Functions

 RGroupingKMeans (const RString &n, RRandom &r, RCursor< cObj > objs, double convergence=0.0, RDebug *debug=0)
 
RString GetName (void) const
 
size_t GetNbIterations (void) const
 
virtual double Similarity (const cObj *obj1, const cObj *obj2)=0
 
virtual double Fitness (void)
 
void DokMeans (size_t max)
 
void Run (cGroups *groups, size_t max, size_t nb, tInitial start=Random)
 
virtual ~RGroupingKMeans (void)
 

Protected Member Functions

void ComputePrototype (cGroup *group)
 
double ComputeSumSim (cGroup *group, const cObj *obj)
 
void InitRandom (size_t nb)
 
void InitRefine (size_t nb, size_t max, size_t nbsub=5, int level=5)
 
void InitkMeansPlusPlus (size_t nb)
 
void ReAllocate (void)
 
size_t CalcNewProtosNb (void)
 

Protected Attributes

RString Name
 
RDebugDebug
 
RRandomRand
 
cGroups * Groups
 
cObj ** Objs
 
size_t NbObjs
 
cObj ** ObjsUsed
 
size_t NbObjsUsed
 
RContainer< Centroid, true, false > Protos
 
size_t NbIterations
 
double Convergence
 

Detailed Description

template<class cGroup, class cObj, class cGroups>
class R::RGroupingKMeans< cGroup, cObj, cGroups >

k-Means Algorithm.

The RGroupingKMeans class provides the implementation of a basic k-Means grouping algorithm.

Member Enumeration Documentation

enum tInitial

How to start the kMeans.

Enumerator
Incremental 

Start from the existing clustering.

Refine 

Use several sub-samples to initialize.

kMeansPlusPlus 

Use the kMeans++ method.

Random 

Initialize randomly.

Constructor & Destructor Documentation

RGroupingKMeans ( const RString n,
RRandom r,
RCursor< cObj >  objs,
double  convergence = 0.0,
RDebug debug = 0 
)

Construct the k-Means.

Parameters
nName of the k-Means.
rRandom number generator to use.
objsCursor over the objects to group.
convergenceConvergence.
debugDebugger.
virtual ~RGroupingKMeans ( void  )
virtual

Destruct the k-Means.

Member Function Documentation

RString GetName ( void  ) const

Get the name of the heuristic.

size_t GetNbIterations ( void  ) const

Get the number of iterations run.

void ComputePrototype ( cGroup *  group)
protected

Compute the prototype of a given group.

Parameters
groupGroup.
double ComputeSumSim ( cGroup *  group,
const cObj *  obj 
)
protected

Compute the sum of similarities between a given object and the other objects of its group.

Parameters
groupGroup.
objObject.
void InitRandom ( size_t  nb)
protected

Creates the initial prototypes by randomly choosing them.

Parameters
nbNumber of groups to create.
void InitRefine ( size_t  nb,
size_t  max,
size_t  nbsub = 5,
int  level = 5 
)
protected

Creates the initial prototypes by using a given number of samples containing a given percentage of the objects to group and running the kMeans. The best centroids of the sample are used.

Parameters
nbNumber of groups to create.
maxMaximal number of iterations.
nbsubNumber of samples to use.
levelPercentage of objects to group for each sample.
void InitkMeansPlusPlus ( size_t  nb)
protected

Creates the initial prototypes by using the kMeans++ method.

Parameters
nbNumber of groups to create.
void ReAllocate ( void  )
protected

Re-Allocation step where the objects are put in the group of the most similar prototype.

size_t CalcNewProtosNb ( void  )
protected

Computing the new prototypes.

Returns
Number of different prototypes.
virtual double Similarity ( const cObj *  obj1,
const cObj *  obj2 
)
pure virtual

Compute the similarity between two objects. It is suppose that if the similarity is between 0 and 1 (same objects).

Parameters
obj1First object.
obj2Second object.
virtual double Fitness ( void  )
virtual

Compute the fitness of the given clustering (the higher is the fitness, the best is the clustering.

By default, it computes the ratio between the average intra-similarity and the similarity between the centroids and the meta-centroid.

void DokMeans ( size_t  max)

Perform a kMeans.

Parameters
maxMaximal number of iterations.
void Run ( cGroups *  groups,
size_t  max,
size_t  nb,
tInitial  start = Random 
)

Run the heuristic.

Parameters
groupsGroup to initialize.
maxMaximal number of iterations.
nbNumber of groups to create.
startHow to the start the clustering.

Member Data Documentation

RString Name
protected

Name of the k-Means

RDebug* Debug
protected

Debugger.

RRandom& Rand
protected

Random number generator to use.

cGroups* Groups
protected

Groups.

cObj** Objs
protected

Objects to group;

size_t NbObjs
protected

Number of objects.

cObj** ObjsUsed
protected

Objects used to build the clusters.

size_t NbObjsUsed
protected

Number of objects used to build the clusters.

RContainer<Centroid,true,false> Protos
protected

Prototypes of the groups. The order of the prototypes is identical as the corresponding group.

size_t NbIterations
protected

Number of Iterations.

double Convergence
protected

Convergence ratio used.