Introduction to Text Mining

Ronen Feldman; James Sanger

doi:10.1017/CBO9780511546914.002

I - Introduction to Text Mining

Published online by Cambridge University Press: 08 August 2009

Ronen Feldman and

James Sanger

Show author details

Ronen Feldman: Affiliation:
Bar-Ilan University, Israel
James Sanger: Affiliation:
ABS Ventures, Boston, Massachusetts

Book contents

Get access

Summary

DEFINING TEXT MINING

Text mining can be broadly defined as a knowledge-intensive process in which a user interacts with a document collection over time by using a suite of analysis tools. In a manner analogous to data mining, text mining seeks to extract useful information from data sources through the identification and exploration of interesting patterns. In the case of text mining, however, the data sources are document collections, and interesting patterns are found not among formalized database records but in the unstructured textual data in the documents in these collections.

Certainly, text mining derives much of its inspiration and direction from seminal research on data mining. Therefore, it is not surprising to find that text mining and data mining systems evince many high-level architectural similarities. For instance, both types of systems rely on preprocessing routines, pattern-discovery algorithms, and presentation-layer elements such as visualization tools to enhance the browsing of answer sets. Further, text mining adopts many of the specific types of patterns in its core knowledge discovery operations that were first introduced and vetted in data mining research.

Because data mining assumes that data have already been stored in a structured format, much of its preprocessing focus falls on two critical tasks: Scrubbing and normalizing data and creating extensive numbers of table joins. In contrast, for text mining systems, preprocessing operations center on the identification and extraction of representative features for natural language documents.

Type: Chapter
Information: The Text Mining Handbook
Advanced Approaches in Analyzing Unstructured Data
, pp. 1 - 18

DOI: https://doi.org/10.1017/CBO9780511546914.002 [Opens in a new window]

Publisher: Cambridge University Press

Print publication year: 2006

Access options

Get access to the full version of this content by using one of the access options below. (Log in options will check for institutional or personal access. Content may require purchase if you do not have access.)

Book contents

I - Introduction to Text Mining

Summary

Access options

Save book to Kindle

Save book to Dropbox

Save book to Google Drive