Book contents
- Frontmatter
- Contents
- Preface
- I Introduction to Text Mining
- II Core Text Mining Operations
- III Text Mining Preprocessing Techniques
- IV Categorization
- V Clustering
- VI Information Extraction
- VII Probabilistic Models for Information Extraction
- VIII Preprocessing Applications Using Probabilistic and Hybrid Approaches
- IX Presentation-Layer Considerations for Browsing and Query Refinement
- X Visualization Approaches
- XI Link Analysis
- XII Text Mining Applications
- Appendix A DIAL: A Dedicated Information Extraction Language for Text Mining
- Bibliography
- Index
VIII - Preprocessing Applications Using Probabilistic and Hybrid Approaches
Published online by Cambridge University Press: 08 August 2009
- Frontmatter
- Contents
- Preface
- I Introduction to Text Mining
- II Core Text Mining Operations
- III Text Mining Preprocessing Techniques
- IV Categorization
- V Clustering
- VI Information Extraction
- VII Probabilistic Models for Information Extraction
- VIII Preprocessing Applications Using Probabilistic and Hybrid Approaches
- IX Presentation-Layer Considerations for Browsing and Query Refinement
- X Visualization Approaches
- XI Link Analysis
- XII Text Mining Applications
- Appendix A DIAL: A Dedicated Information Extraction Language for Text Mining
- Bibliography
- Index
Summary
The related fields of NLP, IE, text categorization, and probabilistic modeling have developed increasingly rapidly in the last few years. New approaches are tried constantly and new systems are reported numbering thousands a year. The fields largely remain experimental science – a new approach or improvement is conceived and a system is built, tested, and reported. However, comparatively little work is done in analyzing the results and in comparing systems and approaches with each other. Usually, it is the task of the authors of a particular system to compare it with other known approaches, and this presents difficulties – both psychological and methodological.
One reason for the dearth of analytical work, excluding the general lack of sound theoretical foundations, is that the comparison experiments require software, which is usually either impossible or very costly to obtain. Moreover, the software requires integration, adjustment, and possibly training for any new use, which is also extremely costly in terms of time and human labor.
Therefore, our description of the different possible solutions to the problems described in the first section is incomplete by necessity. There are just too many reported systems, and there is often no good reason to choose one approach against the other. Consequently, we have tried to describe in depth only a small number of systems. We have chosen as broad a selection as possible, encompassing many different approaches. And, of course, the results produced by the systems are state of the art or sufficiently close to it.
- Type
- Chapter
- Information
- The Text Mining HandbookAdvanced Approaches in Analyzing Unstructured Data, pp. 146 - 176Publisher: Cambridge University PressPrint publication year: 2006