INTRODUCTION TO INFORMATION EXTRACTION
A mature IE technology would allow rapid creation of extraction systems for new tasks whose performance would approach a human level. Nevertheless, even systems without near perfect recall and precision can be of real value. In such cases, the results of the IE system would need to be fed into an auditing environment to allow auditors to fix the system's precision (an easy task) and recall (much harder) errors. These types of systems would also be of value in cases in which the information is too vast for the users to be able to read all of it; hence, even a partially correct IE system would be preferable to the alternative of not obtaining any potentially relevant information. In general, IE systems are useful if the following conditions are met:
The information to be extracted is specified explicitly and no further inference is needed.
A small number of templates are sufficient to summarize the relevant parts of the document.
The needed information is expressed relatively locally in the text (check Bagga and Biermann 2000).
As a first step in tagging documents for text mining systems, each document is processed to find (i.e., extract) entities and relationships that are likely to be meaningful and content-bearing. The term relationships here denotes facts or events involving certain entities.
By way of example, a possible event might be a company's entering into a joint venture to develop a new drug.