Data Item _em_classes.clustering_method

Browse:

General

Item name: _em_classes.clustering_method
Category name: em_classes
Attribute name: clustering_method
Required in PDB entries: no
Used in currrent PDB entries: No

Item Description

The clustering_method used

Data Type

Data type code: ucode
Data type detail: code item types/single words (case insensitive) ...
Primitive data type code: uchar
Regular expression: [_,.;:"&<>()/\{}'`~!@#$%A-Za-z0-9*|+-]*

Controlled Vocabulary

View/Hide Table

Allowed Value	Details
Automatic Clustering and Hierarchical Ascendant Classifications (HAC)	HAC it uses only Ward's criterion. Ward's criterion states that merging HAC clusters should be focused on minimizing the added interclass variance. The two clusters that differ the least between each other will be merged and create a new group, one "level" higher.
Correspondence Analysis	(CA) uses Chi-squared distance This is superior because it ignores differences in exposure between images, eliminating the need to rescale between images.
Didays method	A disadvantage of the K-means method is that the final grouping is very dependent of what seeds are initially chosen. Diday surpassed this by appplying the K-means technique multiple times with different seeds. Then, cross-tabuluating the results, and using only the clusters that were repeatedly formed.
K-Means Clustering	K-Means is a method of clustering that devides the data into a user defined number of groups. Two random images "seeds" are chosen, and their centers of gravity are computed. A partition is drawn down the middle between the centers, the new centers of gravity are computed, and the process is repeated for a given number of times. The final result is VERY dependent on which image seeds are the first chosen. Because our faces data set is manufactured. We know exactly which images are identical, except the random noise, and the exact number of groups. The output discussed was obtained with 8 classes, using factors 1-3, and an even factor weight of 1.0 between those three factors.
Principal Component Analysis	(PCA) computes the distance between data vectors with Euclidean distances.
Wards method
average linkage
centroid method
complete linkage
single linkage

Controlled Vocabulary at Deposition

View/Hide Table

Allowed Value	Details
Automatic Clustering and Hierarchical Ascendant Classifications (HAC)	HAC it uses only Ward's criterion. Ward's criterion states that merging HAC clusters should be focused on minimizing the added interclass variance. The two clusters that differ the least between each other will be merged and create a new group, one "level" higher.
Correspondence Analysis	(CA) uses Chi-squared distance This is superior because it ignores differences in exposure between images, eliminating the need to rescale between images.
Didays method	A disadvantage of the K-means method is that the final grouping is very dependent of what seeds are initially chosen. Diday surpassed this by appplying the K-means technique multiple times with different seeds. Then, cross-tabuluating the results, and using only the clusters that were repeatedly formed.
K-Means Clustering	K-Means is a method of clustering that devides the data into a user defined number of groups. Two random images "seeds" are chosen, and their centers of gravity are computed. A partition is drawn down the middle between the centers, the new centers of gravity are computed, and the process is repeated for a given number of times. The final result is VERY dependent on which image seeds are the first chosen. Because our faces data set is manufactured. We know exactly which images are identical, except the random noise, and the exact number of groups. The output discussed was obtained with 8 classes, using factors 1-3, and an even factor weight of 1.0 between those three factors.
Principal Component Analysis	(PCA) computes the distance between data vectors with Euclidean distances.
Wards method
average linkage
centroid method
complete linkage
single linkage