Categorization

Ronen Feldman; James Sanger

doi:10.1017/CBO9780511546914.005

IV - Categorization

Published online by Cambridge University Press: 08 August 2009

Ronen Feldman and

James Sanger

Show author details

Ronen Feldman: Affiliation:
Bar-Ilan University, Israel
James Sanger: Affiliation:
ABS Ventures, Boston, Massachusetts

Book contents

Get access

Summary

Probably the most common theme in analyzing complex data is the classification, or categorization, of elements. Described abstractly, the task is to classify a given data instance into a prespecified set of categories. Applied to the domain of document management, the task is known as text categorization (TC) – given a set of categories (subjects, topics) and a collection of text documents, the process of finding the correct topic (or topics) for each document.

The study of automated text categorization dates back to the early 1960s (Maron 1961). Then, its main projected use was for indexing scientific literature by means of controlled vocabulary. It was only in the 1990s that the field fully developed with the availability of ever increasing numbers of text documents in digital form and the necessity to organize them for easier use. Nowadays automated TC is applied in a variety of contexts – from the classical automatic or semiautomatic (interactive) indexing of texts to personalized commercials delivery, spam filtering, Web page categorization under hierarchical catalogues, automatic generation of metadata, detection of text genre, and many others.

As with many other artificial intelligence (AI) tasks, there are two main approaches to text categorization. The first is the knowledge engineering approach in which the expert's knowledge about the categories is directly encoded into the system either declaratively or in the form of procedural classification rules.

Type: Chapter
Information: The Text Mining Handbook
Advanced Approaches in Analyzing Unstructured Data
, pp. 64 - 81

DOI: https://doi.org/10.1017/CBO9780511546914.005 [Opens in a new window]

Publisher: Cambridge University Press

Print publication year: 2006

Access options

Get access to the full version of this content by using one of the access options below. (Log in options will check for institutional or personal access. Content may require purchase if you do not have access.)

Book contents

IV - Categorization

Summary

Access options

Book purchase

Temporarily unavailable

Book contents

IV - Categorization

Summary

Access options

Book purchase

Temporarily unavailable

Save book to Kindle

Save book to Dropbox

Save book to Google Drive