Skip to main content Accessibility help
  • Print publication year: 2006
  • Online publication date: August 2009

VIII - Preprocessing Applications Using Probabilistic and Hybrid Approaches


The related fields of NLP, IE, text categorization, and probabilistic modeling have developed increasingly rapidly in the last few years. New approaches are tried constantly and new systems are reported numbering thousands a year. The fields largely remain experimental science – a new approach or improvement is conceived and a system is built, tested, and reported. However, comparatively little work is done in analyzing the results and in comparing systems and approaches with each other. Usually, it is the task of the authors of a particular system to compare it with other known approaches, and this presents difficulties – both psychological and methodological.

One reason for the dearth of analytical work, excluding the general lack of sound theoretical foundations, is that the comparison experiments require software, which is usually either impossible or very costly to obtain. Moreover, the software requires integration, adjustment, and possibly training for any new use, which is also extremely costly in terms of time and human labor.

Therefore, our description of the different possible solutions to the problems described in the first section is incomplete by necessity. There are just too many reported systems, and there is often no good reason to choose one approach against the other. Consequently, we have tried to describe in depth only a small number of systems. We have chosen as broad a selection as possible, encompassing many different approaches. And, of course, the results produced by the systems are state of the art or sufficiently close to it.