Skip to main content Accessibility help
×
Hostname: page-component-8448b6f56d-cfpbc Total loading time: 0 Render date: 2024-04-25T03:39:17.744Z Has data issue: false hasContentIssue false

IV - Categorization

Published online by Cambridge University Press:  08 August 2009

Ronen Feldman
Affiliation:
Bar-Ilan University, Israel
James Sanger
Affiliation:
ABS Ventures, Boston, Massachusetts
Get access

Summary

Probably the most common theme in analyzing complex data is the classification, or categorization, of elements. Described abstractly, the task is to classify a given data instance into a prespecified set of categories. Applied to the domain of document management, the task is known as text categorization (TC) – given a set of categories (subjects, topics) and a collection of text documents, the process of finding the correct topic (or topics) for each document.

The study of automated text categorization dates back to the early 1960s (Maron 1961). Then, its main projected use was for indexing scientific literature by means of controlled vocabulary. It was only in the 1990s that the field fully developed with the availability of ever increasing numbers of text documents in digital form and the necessity to organize them for easier use. Nowadays automated TC is applied in a variety of contexts – from the classical automatic or semiautomatic (interactive) indexing of texts to personalized commercials delivery, spam filtering, Web page categorization under hierarchical catalogues, automatic generation of metadata, detection of text genre, and many others.

As with many other artificial intelligence (AI) tasks, there are two main approaches to text categorization. The first is the knowledge engineering approach in which the expert's knowledge about the categories is directly encoded into the system either declaratively or in the form of procedural classification rules.

Type
Chapter
Information
The Text Mining Handbook
Advanced Approaches in Analyzing Unstructured Data
, pp. 64 - 81
Publisher: Cambridge University Press
Print publication year: 2006

Access options

Get access to the full version of this content by using one of the access options below. (Log in options will check for institutional or personal access. Content may require purchase if you do not have access.)

Save book to Kindle

To save this book to your Kindle, first ensure coreplatform@cambridge.org is added to your Approved Personal Document E-mail List under your Personal Document Settings on the Manage Your Content and Devices page of your Amazon account. Then enter the ‘name’ part of your Kindle email address below. Find out more about saving to your Kindle.

Note you can select to save to either the @free.kindle.com or @kindle.com variations. ‘@free.kindle.com’ emails are free but can only be saved to your device when it is connected to wi-fi. ‘@kindle.com’ emails can be delivered even when you are not connected to wi-fi, but note that service fees apply.

Find out more about the Kindle Personal Document Service.

  • Categorization
  • Ronen Feldman, Bar-Ilan University, Israel, James Sanger, ABS Ventures, Boston, Massachusetts
  • Book: The Text Mining Handbook
  • Online publication: 08 August 2009
  • Chapter DOI: https://doi.org/10.1017/CBO9780511546914.005
Available formats
×

Save book to Dropbox

To save content items to your account, please confirm that you agree to abide by our usage policies. If this is the first time you use this feature, you will be asked to authorise Cambridge Core to connect with your account. Find out more about saving content to Dropbox.

  • Categorization
  • Ronen Feldman, Bar-Ilan University, Israel, James Sanger, ABS Ventures, Boston, Massachusetts
  • Book: The Text Mining Handbook
  • Online publication: 08 August 2009
  • Chapter DOI: https://doi.org/10.1017/CBO9780511546914.005
Available formats
×

Save book to Google Drive

To save content items to your account, please confirm that you agree to abide by our usage policies. If this is the first time you use this feature, you will be asked to authorise Cambridge Core to connect with your account. Find out more about saving content to Google Drive.

  • Categorization
  • Ronen Feldman, Bar-Ilan University, Israel, James Sanger, ABS Ventures, Boston, Massachusetts
  • Book: The Text Mining Handbook
  • Online publication: 08 August 2009
  • Chapter DOI: https://doi.org/10.1017/CBO9780511546914.005
Available formats
×