Skip to main content Accessibility help

Explaining the PENTA model: a reply to Arvaniti and Ladd*

  • Yi Xu (a1), Albert Lee (a2), Santitham Prom-on (a3) and Fang Liu (a4)


This paper presents an overview of the Parallel Encoding and Target Approximation (PENTA) model of speech prosody, in response to an extensive critique by Arvaniti & Ladd (2009). PENTA is a framework for conceptually and computationally linking communicative meanings to fine-grained prosodic details, based on an articulatory-functional view of speech. Target Approximation simulates the articulatory realisation of underlying pitch targets – the prosodic primitives in the framework. Parallel Encoding provides an operational scheme that enables simultaneous encoding of multiple communicative functions. We also outline how PENTA can be computationally tested with a set of software tools. With the help of one of the tools, we offer a PENTA-based hypothetical account of the Greek intonational patterns reported by Arvaniti & Ladd, showing how it is possible to predict the prosodic shapes of an utterance based on the lexical and postlexical meanings it conveys.


Corresponding author


Hide All

We would like to thank Amalia Arvaniti, Antonis Botinis, Bronwen Evans, Bob Ladd and four anonymous reviewers for their comments on earlier drafts of this paper. This work received support from the following sources: the National Science Foundation (NSF BCS-1355479 to the first author), the Royal Society and the Royal Academy of Engineering through the Newton International Fellowship Scheme (to the third author) and the Thai Research Fund through a Research Grant for New Researchers (TRG5680096 to the third author).



Hide All
Arvaniti, Amalia & Ladd, D. Robert (2009). Greek wh-questions and the phonology of intonation. Phonology 26. 4374.
Bailly, Gérard & Holm, Bleicke (2005). SFC: a trainable prosodic model. Speech Communication 46. 348364.
Beckman, Mary E. & Pierrehumbert, Janet B. (1986). Intonational structure in Japanese and English. Phonology Yearbook 3. 255309.
Birkholz, Peter, Kroger, Bernd J. & Neuschaefer-Rube, Christiane (2011). Model-based reproduction of articulatory trajectories for consonant–vowel sequences. IEEE Transactions on Audio, Speech, and Language Processing 19. 14221433.
Black, Alan & Hunt, Andrew (1996). Generating F0 contours from ToBI labels using linear regression. Proceedings of the 4th International Conference on Spoken Language Processing (ICSLP 96). Vol. 3. 1385–1388.
Bolinger, Dwight L. (1986). Intonation and its parts: melody in spoken English. London: Arnold.
Broe, Michael B. & Pierrehumbert, Janet B. (eds.) (2000). Papers in laboratory phonology V: acquisition and the lexicon. Cambridge: Cambridge University Press.
Chen, Matthew Y. (2000). Tone sandhi: patterns across Chinese dialects. Cambridge: Cambridge University Press.
Chen, Yiya & Xu, Yi (2006). Production of weak elements in speech: evidence from F0 patterns of neutral tone in Standard Chinese. Phonetica 63. 4775.
Cooper, William E., Eady, Stephen J. & Mueller, Pamela R. (1985). Acoustical aspects of contrastive stress in question–answer contexts. JASA 77. 21422156.
de Jong, Kenneth (2004). Stress, lexical focus, and segmental focus in English: patterns of variation in vowel duration. JPh 32. 493516.
Doupe, Allison J. & Kuhl, Patricia K. (1999). Birdsong and human speech: common themes and mechanisms. Annual Review of Neuroscience 22. 567631.
Fujisaki, Hiroya (1983). Dynamic characteristics of voice fundamental frequency in speech and singing. In MacNeilage, Peter F. (ed.) The production of speech. New York: Springer. 3955.
Grice, Martine, Ladd, D. Robert & Arvaniti, Amalia (2000). On the place of phrase accents in intonational phonology. Phonology 17. 143185.
Gussenhoven, Carlos (2000). The boundary tones are coming: on the nonperipheral realization of boundary tones. In Broe & Pierrehumbert (2000). 132–151.
Gussenhoven, Carlos (2004). The phonology of tone and intonation. Cambridge: Cambridge University Press.
Hart, Johan 't, Collier, René & Cohen, Antonie (1990). A perceptual study of intonation: an experimental-phonetic approach to speech melody. Cambridge: Cambridge University Press.
Heldner, Mattias (2003). On the reliability of overall intensity and spectral emphasis as acoustic correlates of focal accents in Swedish. JPh 31. 3962.
Hirst, D. J. (2005). Form and function in the representation of speech prosody. Speech Communication 46. 334347.
Jun, Sun-Ah (ed.) (2005). Prosodic typology: the phonology of intonation and phrasing. Oxford: Oxford University Press.
Kochanski, Greg & Shih, Chilin (2003). Prosody modeling with soft templates. Speech Communication 39. 311352.
Ladd, D. Robert (2008). Intonational phonology. 2nd edn. Cambridge: Cambridge University Press.
Lee, Albert, Xu, Yi & Prom-on, Santitham (2014). Modeling Japanese F0 contours using the PENTAtrainers and AMtrainer. Proceedings of the 4th International Symposium on Tonal Aspects of Languages (TAL2014). 164–167.
Liu, Fang & Xu, Yi (2005). Parallel encoding of focus and interrogative meaning in Mandarin intonation. Phonetica 62. 7087.
Liu, Fang, Xu, Yi, Prom-on, Santitham & Yu, Alan (2013). Morpheme-like prosodic functions: evidence from acoustic analysis and computational modelling. Journal of Speech Sciences 3. 85140.
Nick, Teresa A. (2014). Models of vocal learning in the songbird: historical frameworks and the stabilizing critic. Developmental Neurobiology. DOI:10.1002/dneu.22189.
O'Connor, J. D. & Arnold, G. F. (1973). Intonation of colloquial English: a practical handbook. 2nd edn. London: Longman.
Peng, Shu-Hui (2000). Lexical versus ‘phonological’ representations of Mandarin sandhi tones. In Broe & Pierrehumbert (2000). 152–167.
Pierrehumbert, Janet B. (1980). The phonology and phonetics of English intonation. PhD dissertation, MIT.
Pierrehumbert, Janet B. (1981). Synthesizing intonation. JASA 70. 985995.
Pierrehumbert, Janet B. (2000). Tonal elements and their alignment. In Horne, Merle (ed.) Prosody: theory and experiment. Studies presented to Gösta Bruce. Dordrecht: Kluwer. 1136.
Pierrehumbert, Janet B. & Beckman, Mary E. (1988). Japanese tone structure. Cambridge, Mass.: MIT Press.
Pierrehumbert, Janet B. & Hirschberg, Julia (1990). The meaning of intonational contours in the interpretation of discourse. In Cohen, Philip R., Morgan, Jerry & Pollack, Martha E. (eds.) Intentions in communication. Cambridge, Mass.: MIT Press. 271311.
Prom-on, Santitham, Birkholz, Peter & Xu, Yi (2013). Training an articulatory synthesizer with continuous acoustic data. Proceedings of Interspeech 2013. 349–353.
Prom-on, Santitham & Xu, Yi (2012). PENTATrainer2: a hypothesis-driven prosody modeling tool. In Antonis Botinis (ed.) Proceedings of the 5th IESL Conference on Experimental Linguistics, Athens, Greece. 93–100.
Prom-on, Santitham, Xu, Yi & Thipakorn, Bundit (2009). Modeling tone and intonation in Mandarin and English as a process of target approximation. JASA 125. 405424.
Raidt, S., Bailly, G., Holm, B. & Mixdorff, H. (2004). Automatic generation of prosody: comparing two superpositional systems. In Bel, Bernard & Marlien, Isabelle (eds.) Speech prosody 2004. Nara, Japan. Available (October 2015) at 417–420.
Saltzman, Elliot & Munhall, Kevin G. (1989). A dynamical approach to gestural patterning in speech production. Ecological Psychology 1. 333382.
Sun, Xuejing (2002). The determination, analysis, and synthesis of fundamental frequency. PhD dissertation, Northwestern University.
Taylor, Paul (2000). Analysis and synthesis of intonation using the Tilt model. JASA 107. 16971714.
Wang, Bei & Xu, Yi (2011). Differential prosodic encoding of topic and focus in sentence-initial position in Mandarin Chinese. JPh 39. 595611.
Xu, Ching X. & Xu, Yi (2003). Effects of consonant aspiration on Mandarin tones. Journal of the International Phonetic Association 33. 165181.
Xu, Ching X., Xu, Yi & Luo, Li-Shi (1999). A pitch target approximation model for F0 contours in Mandarin. In Ohala, John J., Hasegawa, Yoko, Ohala, Manjari, Granville, Daniel & Bailey, Ashlee C. (eds.) Proceedings of the 14th International Congress of Phonetic Sciences. Berkeley: University of California. 23592362.
Xu, Yi (1997). Contextual tonal variations in Mandarin. JPh 25. 6183.
Xu, Yi (2005). Speech melody as articulatorily implemented communicative functions. Speech Communication 46. 220251.
Xu, Yi (2011a). Speech prosody: a methodological review. Journal of Speech Sciences 1. 85115.
Xu, Yi (2011b). Post-focus compression: cross-linguistic distribution and historical origin. In Lee, Wai-Sum & Zee, Eric (eds.) Proceedings of the 17th International Congress of Phonetic Sciences, Hong Kong 2011. Hong Kong: University of Hong Kong. 152155.
Xu, Yi, Chen, Szu-Wei & Wang, Bei (2012). Prosodic focus with and without post-focus compression: a typological divide within the same language family? The Linguistic Review 29. 131147.
Xu, Yi, Kelly, Andrew & Smillie, Cameron (2013). Emotional expressions as communicative signals. In Hancil, Sylvie & Hirst, Daniel (eds.) Prosody and iconicity. Amsterdam & Philadelphia: Benjamins. 3359.
Xu, Yi, Lee, Albert, Wu, Wing-Li, Liu, Xuan & Birkholz, Peter (2013). Human vocal attractiveness as signaled by body size projection. PLoS ONE 8. e62397. Available at
Xu, Yi & Liu, Fang (2006). Tonal alignment, syllable structure and coarticulation: toward an integrated model. Rivista di Linguistica 18. 125159.
Xu, Yi & Liu, Fang (2012). Intrinsic coherence of prosodic and segmental aspects of speech. In Niebuhr, Oliver (ed.) Understanding prosody: the role of context, function and communication. Berlin & Boston: de Gruyter. 126.
Xu, Yi & Prom-on, Santitham (2010–14). PENTAtrainer1: a Praat script for extracting pitch targets from individual sound files. Available (October 2015) at
Xu, Yi & Prom-on, Santitham (2014). Toward invariant functional representations of variable surface fundamental frequency contours: synthesizing speech melody via model-based stochastic learning. Speech Communication 57. 181208.
Xu, Yi & Wang, Q. Emily (2001). Pitch targets and their realization: evidence from Mandarin Chinese. Speech Communication 33. 319337.
Xu, Yi & Xu, Ching X. (2005). Phonetic realization of focus in English declarative intonation. JPh 33. 159197.


Full text views

Total number of HTML views: 0
Total number of PDF views: 0 *
Loading metrics...

Abstract views

Total abstract views: 0 *
Loading metrics...

* Views captured on Cambridge Core between <date>. This data will be updated every 24 hours.

Usage data cannot currently be displayed