Clustering

Ronen Feldman; James Sanger

doi:10.1017/CBO9780511546914.006

V - Clustering

Published online by Cambridge University Press: 08 August 2009

Ronen Feldman and

James Sanger

Show author details

Ronen Feldman: Affiliation:
Bar-Ilan University, Israel
James Sanger: Affiliation:
ABS Ventures, Boston, Massachusetts

Book contents

Get access

Summary

Clustering is an unsupervised process through which objects are classified into groups called clusters. In categorization problems, as described in Chapter IV, we are provided with a collection of preclassified training examples, and the task of the system is to learn the descriptions of classes in order to be able to classify a new unlabeled object. In the case of clustering, the problem is to group the given unlabeled collection into meaningful clusters without any prior information. Any labels associated with objects are obtained solely from the data.

Clustering is useful in a wide range of data analysis fields, including data mining, document retrieval, image segmentation, and pattern classification. In many such problems, little prior information is available about the data, and the decision-maker must make as few assumptions about the data as possible. It is for those cases the clustering methodology is especially appropriate.

Clustering techniques are described in this chapter in the context of textual data analysis. Section V.1 discusses the various applications of clustering in text analysis domains. Sections V.2 and V.3 address the general clustering problem and present several clustering algorithms. Finally Section V.4 demonstrates how the clustering algorithms can be adapted to text analysis.

CLUSTERING TASKS IN TEXT ANALYSIS

One application of clustering is the analysis and navigation of big text collections such as Web pages. The basic assumption, called the cluster hypothesis, states that relevant documents tend to be more similar to each other than to nonrelevant ones.

Type: Chapter
Information: The Text Mining Handbook
Advanced Approaches in Analyzing Unstructured Data
, pp. 82 - 93

DOI: https://doi.org/10.1017/CBO9780511546914.006 [Opens in a new window]

Publisher: Cambridge University Press

Print publication year: 2006

Access options

Get access to the full version of this content by using one of the access options below. (Log in options will check for institutional or personal access. Content may require purchase if you do not have access.)

Book contents

V - Clustering

Summary

Access options

Save book to Kindle

Save book to Dropbox

Save book to Google Drive