11 - Tree-Based Models
from III - Advanced Topics
Published online by Cambridge University Press: 05 June 2012
Summary
Linguistic theories of syntax build on a recursive representation of sentences, which are referred to as syntactic trees. However, the models for statistical machine translation that we presented in previous chapters operate on a flat sequence representation of sentences – the idea that a sentence is a string of words. Since syntactic trees equip us to exploit the syntactic relationships between words and phrases, it is an intriguing proposition to build models for machine translation based on tree structures.
This line of research in statistical machine translation has been pursued for many years. Recently, some core methods have crystalized and translation systems employing tree-based models have been demonstrated to perform at the level of phrase-based models, in some cases even outperforming them.
In this chapter we will introduce core concepts of tree-based models. Bear in mind, however, that this is a fast-moving research frontier and new methods are constantly being proposed and tested. What we describe here are the underlying principles of the currently most successful models.
The structure of this chapter is as follows. We introduce the notion of synchronous grammars in Section 11.1, discuss how to learn these grammars in Section 11.2, and discuss decoding in Section 11.3.
- Type
- Chapter
- Information
- Statistical Machine Translation , pp. 331 - 370Publisher: Cambridge University PressPrint publication year: 2009