Preprocessing Applications Using Probabilistic and Hybrid Approaches

Ronen Feldman; James Sanger

doi:10.1017/CBO9780511546914.009

VIII - Preprocessing Applications Using Probabilistic and Hybrid Approaches

Published online by Cambridge University Press: 08 August 2009

Ronen Feldman and

James Sanger

Show author details

Ronen Feldman: Affiliation:
Bar-Ilan University, Israel
James Sanger: Affiliation:
ABS Ventures, Boston, Massachusetts

Book contents

Get access

Summary

The related fields of NLP, IE, text categorization, and probabilistic modeling have developed increasingly rapidly in the last few years. New approaches are tried constantly and new systems are reported numbering thousands a year. The fields largely remain experimental science – a new approach or improvement is conceived and a system is built, tested, and reported. However, comparatively little work is done in analyzing the results and in comparing systems and approaches with each other. Usually, it is the task of the authors of a particular system to compare it with other known approaches, and this presents difficulties – both psychological and methodological.

One reason for the dearth of analytical work, excluding the general lack of sound theoretical foundations, is that the comparison experiments require software, which is usually either impossible or very costly to obtain. Moreover, the software requires integration, adjustment, and possibly training for any new use, which is also extremely costly in terms of time and human labor.

Therefore, our description of the different possible solutions to the problems described in the first section is incomplete by necessity. There are just too many reported systems, and there is often no good reason to choose one approach against the other. Consequently, we have tried to describe in depth only a small number of systems. We have chosen as broad a selection as possible, encompassing many different approaches. And, of course, the results produced by the systems are state of the art or sufficiently close to it.

Type: Chapter
Information: The Text Mining Handbook
Advanced Approaches in Analyzing Unstructured Data
, pp. 146 - 176

DOI: https://doi.org/10.1017/CBO9780511546914.009 [Opens in a new window]

Publisher: Cambridge University Press

Print publication year: 2006

Access options

Get access to the full version of this content by using one of the access options below. (Log in options will check for institutional or personal access. Content may require purchase if you do not have access.)

Book contents

VIII - Preprocessing Applications Using Probabilistic and Hybrid Approaches

Summary

Access options

Save book to Kindle

Save book to Dropbox

Save book to Google Drive