Published online by Cambridge University Press: 05 July 2014
Unsupervised cluster discovery is instrumental for data analysis in many important applications. It involves a process of partitioning the training dataset into disjoint groups. The performance of cluster discovery depends on several key factors, including the number of clusters, the topology of node vectors, the objective function for clustering, iterative learning algorithms (often with multiple initial conditions), and, finally, an evaluation criterion for picking the best result among multiple trials. This part contains two chapters: Chapter 5 covers unsupervised learning models employing the conventional Euclidean metric for vectorial data analysis while Chapter 6 focuses on the use of kernel-induced metrics and kernelized learning models, which may be equally applied to nonvectorial data analysis.
Chapter 5 covers several conventional unsupervised learning models for cluster discovery, including K-means and expectation-maximization (EM) learning models, which are presented in Algorithms 5.1 and 5.2, along with the respective proofs of the monotonic convergence property, in Theorems 5.1 and 5.2. By imposing topological sensitivity on the cluster (or node) structure, we can extend the basic K-means learning rule to the SOM learning model presented in Algorithm 5.3. Finally, for biclustering problems, where features and objects are simultaneously clustered, several useful coherence models are proposed.
Chapter 6 covers kernel-based cluster discovery, which is useful both for vectorial and for nonvectorial data analyses.