Explaining the PENTA model: a reply to Arvaniti and Ladd*

Yi Xu; Albert Lee; Santitham Prom-on; Fang Liu

doi:10.1017/S0952675715000299

Explaining the PENTA model: a reply to Arvaniti and Ladd*

Published online by Cambridge University Press: 15 February 2016

Yi Xu ,

Albert Lee ,

Santitham Prom-on and

Fang Liu

Show author details

Yi Xu*: Affiliation:
University College London
Albert Lee*: Affiliation:
University of Hong Kong
Santitham Prom-on*: Affiliation:
King Mongkut's University of Technology Thonburi
Fang Liu*: Affiliation:
University of Essex
*: E-mail: yi.xu@ucl.ac.uk, albertlee@hku.hk, santitham@cpe.kmutt.ac.th, f.liu@essex.ac.uk.
E-mail: yi.xu@ucl.ac.uk, albertlee@hku.hk, santitham@cpe.kmutt.ac.th, f.liu@essex.ac.uk.
E-mail: yi.xu@ucl.ac.uk, albertlee@hku.hk, santitham@cpe.kmutt.ac.th, f.liu@essex.ac.uk.
E-mail: yi.xu@ucl.ac.uk, albertlee@hku.hk, santitham@cpe.kmutt.ac.th, f.liu@essex.ac.uk.

Article contents

Abstract
Footnotes
References

Get access

Rights & Permissions

Abstract

This paper presents an overview of the Parallel Encoding and Target Approximation (PENTA) model of speech prosody, in response to an extensive critique by Arvaniti & Ladd (2009). PENTA is a framework for conceptually and computationally linking communicative meanings to fine-grained prosodic details, based on an articulatory-functional view of speech. Target Approximation simulates the articulatory realisation of underlying pitch targets – the prosodic primitives in the framework. Parallel Encoding provides an operational scheme that enables simultaneous encoding of multiple communicative functions. We also outline how PENTA can be computationally tested with a set of software tools. With the help of one of the tools, we offer a PENTA-based hypothetical account of the Greek intonational patterns reported by Arvaniti & Ladd, showing how it is possible to predict the prosodic shapes of an utterance based on the lexical and postlexical meanings it conveys.

Type: Squibs and Replies
Information: Phonology , Volume 32 , Issue 3 , December 2015 , pp. 505 - 535

DOI: https://doi.org/10.1017/S0952675715000299 [Opens in a new window]
Copyright: Copyright © Cambridge University Press 2016

Access options

Get access to the full version of this content by using one of the access options below. (Log in options will check for institutional or personal access. Content may require purchase if you do not have access.)

Footnotes

We would like to thank Amalia Arvaniti, Antonis Botinis, Bronwen Evans, Bob Ladd and four anonymous reviewers for their comments on earlier drafts of this paper. This work received support from the following sources: the National Science Foundation (NSF BCS-1355479 to the first author), the Royal Society and the Royal Academy of Engineering through the Newton International Fellowship Scheme (to the third author) and the Thai Research Fund through a Research Grant for New Researchers (TRG5680096 to the third author).

References

REFERENCES

Arvaniti, Amalia & Ladd, D. Robert (2009). Greek wh-questions and the phonology of intonation. Phonology 26. 43–74.Google Scholar

Bailly, Gérard & Holm, Bleicke (2005). SFC: a trainable prosodic model. Speech Communication 46. 348–364.Google Scholar

Beckman, Mary E. & Pierrehumbert, Janet B. (1986). Intonational structure in Japanese and English. Phonology Yearbook 3. 255–309.Google Scholar

Birkholz, Peter, Kroger, Bernd J. & Neuschaefer-Rube, Christiane (2011). Model-based reproduction of articulatory trajectories for consonant–vowel sequences. IEEE Transactions on Audio, Speech, and Language Processing 19. 1422–1433.Google Scholar

Black, Alan & Hunt, Andrew (1996). Generating F₀ contours from ToBI labels using linear regression. Proceedings of the 4th International Conference on Spoken Language Processing (ICSLP 96). Vol. 3. 1385–1388.Google Scholar

Bolinger, Dwight L. (1986). Intonation and its parts: melody in spoken English. London: Arnold.Google Scholar

Broe, Michael B. & Pierrehumbert, Janet B. (eds.) (2000). Papers in laboratory phonology V: acquisition and the lexicon. Cambridge: Cambridge University Press.Google Scholar

Chen, Matthew Y. (2000). Tone sandhi: patterns across Chinese dialects. Cambridge: Cambridge University Press.CrossRef Google Scholar

Chen, Yiya & Xu, Yi (2006). Production of weak elements in speech: evidence from F₀ patterns of neutral tone in Standard Chinese. Phonetica 63. 47–75.Google Scholar

Cooper, William E., Eady, Stephen J. & Mueller, Pamela R. (1985). Acoustical aspects of contrastive stress in question–answer contexts. JASA 77. 2142–2156.CrossRef Google Scholar PubMed

de Jong, Kenneth (2004). Stress, lexical focus, and segmental focus in English: patterns of variation in vowel duration. JPh 32. 493–516.Google Scholar

Doupe, Allison J. & Kuhl, Patricia K. (1999). Birdsong and human speech: common themes and mechanisms. Annual Review of Neuroscience 22. 567–631.Google Scholar

Fujisaki, Hiroya (1983). Dynamic characteristics of voice fundamental frequency in speech and singing. In MacNeilage, Peter F. (ed.) The production of speech. New York: Springer. 39–55.Google Scholar

Grice, Martine, Ladd, D. Robert & Arvaniti, Amalia (2000). On the place of phrase accents in intonational phonology. Phonology 17. 143–185.CrossRef Google Scholar

Gussenhoven, Carlos (2000). The boundary tones are coming: on the nonperipheral realization of boundary tones. In Broe & Pierrehumbert (2000). 132–151.Google Scholar

Gussenhoven, Carlos (2004). The phonology of tone and intonation. Cambridge: Cambridge University Press.Google Scholar

Hart, Johan 't, Collier, René & Cohen, Antonie (1990). A perceptual study of intonation: an experimental-phonetic approach to speech melody. Cambridge: Cambridge University Press.Google Scholar

Heldner, Mattias (2003). On the reliability of overall intensity and spectral emphasis as acoustic correlates of focal accents in Swedish. JPh 31. 39–62.Google Scholar

Hirst, D. J. (2005). Form and function in the representation of speech prosody. Speech Communication 46. 334–347.Google Scholar

Jun, Sun-Ah (ed.) (2005). Prosodic typology: the phonology of intonation and phrasing. Oxford: Oxford University Press.Google Scholar

Kochanski, Greg & Shih, Chilin (2003). Prosody modeling with soft templates. Speech Communication 39. 311–352.Google Scholar

Ladd, D. Robert (2008). Intonational phonology. 2nd edn. Cambridge: Cambridge University Press.Google Scholar

Lee, Albert, Xu, Yi & Prom-on, Santitham (2014). Modeling Japanese F0 contours using the PENTAtrainers and AMtrainer. Proceedings of the 4th International Symposium on Tonal Aspects of Languages (TAL2014). 164–167.Google Scholar

Liu, Fang & Xu, Yi (2005). Parallel encoding of focus and interrogative meaning in Mandarin intonation. Phonetica 62. 70–87.Google Scholar

Liu, Fang, Xu, Yi, Prom-on, Santitham & Yu, Alan (2013). Morpheme-like prosodic functions: evidence from acoustic analysis and computational modelling. Journal of Speech Sciences 3. 85–140.CrossRef Google Scholar

Nick, Teresa A. (2014). Models of vocal learning in the songbird: historical frameworks and the stabilizing critic. Developmental Neurobiology. DOI:10.1002/dneu.22189.Google Scholar

O'Connor, J. D. & Arnold, G. F. (1973). Intonation of colloquial English: a practical handbook. 2nd edn. London: Longman.Google Scholar

Peng, Shu-Hui (2000). Lexical versus ‘phonological’ representations of Mandarin sandhi tones. In Broe & Pierrehumbert (2000). 152–167.Google Scholar

Pierrehumbert, Janet B. (1980). The phonology and phonetics of English intonation. PhD dissertation, MIT.Google Scholar

Pierrehumbert, Janet B. (1981). Synthesizing intonation. JASA 70. 985–995.Google Scholar

Pierrehumbert, Janet B. (2000). Tonal elements and their alignment. In Horne, Merle (ed.) Prosody: theory and experiment. Studies presented to Gösta Bruce. Dordrecht: Kluwer. 11–36.Google Scholar

Pierrehumbert, Janet B. & Beckman, Mary E. (1988). Japanese tone structure. Cambridge, Mass.: MIT Press.Google Scholar

Pierrehumbert, Janet B. & Hirschberg, Julia (1990). The meaning of intonational contours in the interpretation of discourse. In Cohen, Philip R., Morgan, Jerry & Pollack, Martha E. (eds.) Intentions in communication. Cambridge, Mass.: MIT Press. 271–311.CrossRef Google Scholar

Prom-on, Santitham, Birkholz, Peter & Xu, Yi (2013). Training an articulatory synthesizer with continuous acoustic data. Proceedings of Interspeech 2013. 349–353.CrossRef Google Scholar

Prom-on, Santitham & Xu, Yi (2012). PENTATrainer2: a hypothesis-driven prosody modeling tool. In Antonis Botinis (ed.) Proceedings of the 5th IESL Conference on Experimental Linguistics, Athens, Greece. 93–100.Google Scholar

Prom-on, Santitham, Xu, Yi & Thipakorn, Bundit (2009). Modeling tone and intonation in Mandarin and English as a process of target approximation. JASA 125. 405–424.Google Scholar

Raidt, S., Bailly, G., Holm, B. & Mixdorff, H. (2004). Automatic generation of prosody: comparing two superpositional systems. In Bel, Bernard & Marlien, Isabelle (eds.) Speech prosody 2004. Nara, Japan. Available (October 2015) at http://www.isca-speech.org/archive/sp2004. 417–420.Google Scholar

Saltzman, Elliot & Munhall, Kevin G. (1989). A dynamical approach to gestural patterning in speech production. Ecological Psychology 1. 333–382.Google Scholar

Sun, Xuejing (2002). The determination, analysis, and synthesis of fundamental frequency. PhD dissertation, Northwestern University.Google Scholar

Taylor, Paul (2000). Analysis and synthesis of intonation using the Tilt model. JASA 107. 1697–1714.CrossRef Google Scholar PubMed

Wang, Bei & Xu, Yi (2011). Differential prosodic encoding of topic and focus in sentence-initial position in Mandarin Chinese. JPh 39. 595–611.Google Scholar

Xu, Ching X. & Xu, Yi (2003). Effects of consonant aspiration on Mandarin tones. Journal of the International Phonetic Association 33. 165–181.Google Scholar

Xu, Ching X., Xu, Yi & Luo, Li-Shi (1999). A pitch target approximation model for F₀ contours in Mandarin. In Ohala, John J., Hasegawa, Yoko, Ohala, Manjari, Granville, Daniel & Bailey, Ashlee C. (eds.) Proceedings of the 14th International Congress of Phonetic Sciences. Berkeley: University of California. 2359–2362.Google Scholar

Xu, Yi (1997). Contextual tonal variations in Mandarin. JPh 25. 61–83.Google Scholar

Xu, Yi (2005). Speech melody as articulatorily implemented communicative functions. Speech Communication 46. 220–251.Google Scholar

Xu, Yi (2011a). Speech prosody: a methodological review. Journal of Speech Sciences 1. 85–115.CrossRef Google Scholar

Xu, Yi (2011b). Post-focus compression: cross-linguistic distribution and historical origin. In Lee, Wai-Sum & Zee, Eric (eds.) Proceedings of the 17th International Congress of Phonetic Sciences, Hong Kong 2011. Hong Kong: University of Hong Kong. 152–155.Google Scholar

Xu, Yi, Chen, Szu-Wei & Wang, Bei (2012). Prosodic focus with and without post-focus compression: a typological divide within the same language family? The Linguistic Review 29. 131–147.Google Scholar

Xu, Yi, Kelly, Andrew & Smillie, Cameron (2013). Emotional expressions as communicative signals. In Hancil, Sylvie & Hirst, Daniel (eds.) Prosody and iconicity. Amsterdam & Philadelphia: Benjamins. 33–59.Google Scholar

Xu, Yi, Lee, Albert, Wu, Wing-Li, Liu, Xuan & Birkholz, Peter (2013). Human vocal attractiveness as signaled by body size projection. PLoS ONE 8. e62397. Available at http://www.plosone.org/article/info%3Adoi%2F10.1371%2Fjournal.pone.0062397.Google Scholar

Xu, Yi & Liu, Fang (2006). Tonal alignment, syllable structure and coarticulation: toward an integrated model. Rivista di Linguistica 18. 125–159.Google Scholar

Xu, Yi & Liu, Fang (2012). Intrinsic coherence of prosodic and segmental aspects of speech. In Niebuhr, Oliver (ed.) Understanding prosody: the role of context, function and communication. Berlin & Boston: de Gruyter. 1–26.Google Scholar

Xu, Yi & Prom-on, Santitham (2010–14). PENTAtrainer1: a Praat script for extracting pitch targets from individual sound files. Available (October 2015) at http://www.phon.ucl.ac.uk/home/yi/PENTAtrainer1.Google Scholar

Xu, Yi & Prom-on, Santitham (2014). Toward invariant functional representations of variable surface fundamental frequency contours: synthesizing speech melody via model-based stochastic learning. Speech Communication 57. 181–208.Google Scholar

Xu, Yi & Wang, Q. Emily (2001). Pitch targets and their realization: evidence from Mandarin Chinese. Speech Communication 33. 319–337.Google Scholar

Xu, Yi & Xu, Ching X. (2005). Phonetic realization of focus in English declarative intonation. JPh 33. 159–197.Google Scholar

Article contents

Explaining the PENTA model: a reply to Arvaniti and Ladd*

Abstract

Access options

Footnotes

References

REFERENCES

Save article to Kindle

Save article to Dropbox

Save article to Google Drive

Reply to: Submit a response

Your details

You have entered the maximum number of contributors

Conflicting interests