6 - Decoding
from II - Core Methods
Published online by Cambridge University Press: 05 June 2012
Summary
In the two previous chapters we presented two models for machine translation, one based on the translation of words, and another based on the translation of phrases as atomic units. Both models were defined as mathematical formulae that, given a possible translation, assign a probabilistic score to it.
The task of decoding in machine translation is to find the best scoring translation according to these formulae. This is a hard problem, since there is an exponential number of choices, given a specific input sentence. In fact, it has been shown that the decoding problem for the presented machine translation models is NP-complete [Knight, 1999a]. In other words, exhaustively examining all possible translations, scoring them, and picking the best is computationally too expensive for an input sentence of even modest length.
In this chapter, we will present a number of techniques that make it possible to efficiently carry out the search for the best translation. These methods are called heuristic search methods. This means that there is no guarantee that they will find the best translation, but we do hope to find it often enough, or at least a translation that is very close to it.
Will decoding find a good translation for a given input? Note that there are two types of error that may prevent this. A search error is the failure to find the best translation according to the model, in other words, the highest-probability translation.
- Type
- Chapter
- Information
- Statistical Machine Translation , pp. 155 - 180Publisher: Cambridge University PressPrint publication year: 2009