Information Extraction

Ronen Feldman; James Sanger

doi:10.1017/CBO9780511546914.007

VI - Information Extraction

Published online by Cambridge University Press: 08 August 2009

Ronen Feldman and

James Sanger

Show author details

Ronen Feldman: Affiliation:
Bar-Ilan University, Israel
James Sanger: Affiliation:
ABS Ventures, Boston, Massachusetts

Book contents

Get access

Summary

INTRODUCTION TO INFORMATION EXTRACTION

A mature IE technology would allow rapid creation of extraction systems for new tasks whose performance would approach a human level. Nevertheless, even systems without near perfect recall and precision can be of real value. In such cases, the results of the IE system would need to be fed into an auditing environment to allow auditors to fix the system's precision (an easy task) and recall (much harder) errors. These types of systems would also be of value in cases in which the information is too vast for the users to be able to read all of it; hence, even a partially correct IE system would be preferable to the alternative of not obtaining any potentially relevant information. In general, IE systems are useful if the following conditions are met:

The information to be extracted is specified explicitly and no further inference is needed.
A small number of templates are sufficient to summarize the relevant parts of the document.
The needed information is expressed relatively locally in the text (check Bagga and Biermann 2000).

As a first step in tagging documents for text mining systems, each document is processed to find (i.e., extract) entities and relationships that are likely to be meaningful and content-bearing. The term relationships here denotes facts or events involving certain entities.

By way of example, a possible event might be a company's entering into a joint venture to develop a new drug.

Type: Chapter
Information: The Text Mining Handbook
Advanced Approaches in Analyzing Unstructured Data
, pp. 94 - 130

DOI: https://doi.org/10.1017/CBO9780511546914.007 [Opens in a new window]

Publisher: Cambridge University Press

Print publication year: 2006

Access options

Get access to the full version of this content by using one of the access options below. (Log in options will check for institutional or personal access. Content may require purchase if you do not have access.)

Book contents

VI - Information Extraction

Summary

Access options

Save book to Kindle

Save book to Dropbox

Save book to Google Drive