Chengwei LEI, Ph.D.    Associate Professor

Department of Computer and Electrical Engineering and Computer Science
California State University, Bakersfield

 

Data Science

 

Performance Evaluation

 

Regression Problems  /  Classification Problems  /  Clustering Problems




Clustering Problems



Internal Evaluation / Without "Ground Truth Information" / Unsupervised


BetaCV:

The smaller the BetaCV ratio, the better the clustering.


C-index:

Wmin(Nin) be the sum of the smallest Nin distances in the proximity matrix W,
where Nin is the total number of intracluster edges

The smaller the C-index, the better the clustering.

The C-index lies in the range [0,1].


Modularity:

The smaller the modularity measure the better the clustering.


Normalized Cut:

The higher normalized cut value, the better the clustering.

 



Dunn Index

Davies-Bouldin Index




 




External Evaluation / With "Ground Truth Information" / Cross Dataset / Supervised


Purity:

Demo Calculator


Maximum Matching:

 

Only one cluster can match with a given partition


F-measure:

Demo Calculator

 


Pairwise Based:

 

Jaccard Coefficient

Rand Statistic / Rand Index

Demo Calculator

The higher the Rand index, the better the clustering.

The Rand Index lies in the range [0,1].

 


Conditional Entropy:

 


Normalized Mutual Information:

The NMI value lies in the range [0,1]. The higher the NMI value, the better the clustering.

 

 




Relative Evaluation / Comparing different parameters


Silhouette Coefficient

Calinski–Harabasz Index