Long words in maximum entropy phonotactic grammars*

Robert Daland

doi:10.1017/S0952675715000251

Long words in maximum entropy phonotactic grammars*

Published online by Cambridge University Press: 15 February 2016

Robert Daland

Show author details

Robert Daland*: Affiliation:
University of California, Los Angeles
*: E-mail: r.daland@gmail.com.

Article contents

Abstract
Footnotes
References

Get access

Rights & Permissions

Abstract

A phonotactic grammar assigns a well-formedness score to all possible surface forms. This paper considers whether phonotactic grammars should be probabilistic, and gives several arguments that they need to be. Hayes & Wilson (2008) demonstrate the promise of a maximum entropy Harmonic Grammar as a probabilistic phonotactic grammar. This paper points out a theoretical issue with maxent phonotactic grammars: they are not guaranteed to assign a well-defined probability distribution, because sequences that contain arbitrary repetitions of unmarked sequences may be underpenalised. The paper motivates a solution to this issue: include a *Struct constraint. A mathematical proof of necessary and sufficient conditions to avoid the underpenalisation problem are given in online supplementary materials.

Type: Articles
Information: Phonology , Volume 32 , Issue 3 , December 2015 , pp. 353 - 383

DOI: https://doi.org/10.1017/S0952675715000251 [Opens in a new window]
Copyright: Copyright © Cambridge University Press 2016

Access options

Get access to the full version of this content by using one of the access options below. (Log in options will check for institutional or personal access. Content may require purchase if you do not have access.)

Footnotes

I wish to acknowledge the entire LSCP (Laboratoire des Sciences Cognitives, ENS/Paris), especially Benjamin Börschinger and Abdel Fourtassi, who collaborated with me on the project that led to the insights on this paper, Mark Johnson, who pointed out that maxent should work for Σ*, and Alex Cristia, Sharon Peperkamp and Emmanuel Dupoux for inviting me to the LSCP. I also wish to acknowledge Colin Wilson and Maria Gouskova for useful discussion of the issue, and the editors of this journal for advice.

Three theorems which formalise the central contributions of this paper are discussed in the online supplementary materials, available at http://www.journals.cambridge.org/issue_Phonology/Vol32No03.

References

REFERENCES

Anttila, Arto (1997). Deriving variation from grammar. In Hinskens, Frans, van Hout, Roeland & Wetzels, W. Leo (eds.) Variation, change and phonological theory. Amsterdam & Philadelphia: Benjamins. 35–68.Google Scholar

Baayen, R. Harald (2001). Word frequency distributions. Dordrecht: Kluwer.CrossRef Google Scholar

Baayen, R. Harald & Schreuder, Robert (2000). Towards a psycholinguistic computational model for morphological parsing. Philosophical Transactions of the Royal Society of London. Series A: Mathematical, Physical and Engineering Sciences 358. 1281–1293.CrossRef Google Scholar

Bane, Max & Riggle, Jason (2012). Consequences of candidate omission. LI 43. 695–706.Google Scholar

Boersma, Paul & Hayes, Bruce (2001). Empirical tests of the Gradual Learning Algorithm. LI 32. 45–86.Google Scholar

Boersma, Paul & Pater, Joe (to appear). Convergence properties of a Gradual Learning Algorithm for Harmonic Grammar. In McCarthy & Pater (to appear).Google Scholar

Bowers, Dustin (2014). Balancing leveling and composite URs. Paper presented at Phonology 2014, MIT.Google Scholar

Chi, Zhiyi & Geman, Stuart (1998). Estimation of probabilistic context-free grammars. Computational Linguistics 24. 299–305.Google Scholar

Chodroff, Eleanor & Wilson, Colin (2014). Phonetic vs. phonological factors in coronal-to-dorsal perceptual assimilation. Paper presented at LabPhon 14: the 14th Conference on Laboratory Phonology, Tokyo.Google Scholar

Chomsky, Noam (1956). Three models for the description of language. IRE Transactions on Information Theory 2:3. 113–124.CrossRef Google Scholar

Chomsky, Noam & Halle, Morris (1965). Some controversial questions in phonological theory. JL 1. 97–138.Google Scholar

Chomsky, Noam & Halle, Morris (1968). The sound pattern of English. New York: Harper & Row.Google Scholar

Coady, Jeffry A. & Evans, Julia L. (2008). Uses and interpretations of non-word repetition tasks in children with and without specific language impairments (SLI). International Journal of Language Communication Disorders 43. 1–40.CrossRef Google Scholar PubMed

Coetzee, Andries W. & Kawahara, Shigeto (2013). Frequency biases in phonological variation. NLLT 31. 47–89.Google Scholar

Coetzee, Andries W. & Pater, Joe (2011). The place of variation in phonological theory. In Goldsmith, John, Riggle, Jason & Yu, Alan (eds.) The handbook of phonological theory. 2nd edn. Malden, Mass. & Oxford: Wiley-Blackwell. 401–431.Google Scholar

Coleman, John & Pierrehumbert, Janet B. (1997). Stochastic phonological grammars and acceptability. In Coleman, John (ed.) Proceedings of the 3rd Meeting of the ACL Special Interest Group in Computational Phonology. Somerset, NJ: Association for Computational Linguistics. 49–56.Google Scholar

Daland, Robert, Börschinger, Benjamin & Fourtassi, Abdellah (2014). On lexical phonotactics and segmentability. Paper presented at LabPhon 14: the 14th Conference on Laboratory Phonology, Tokyo.Google Scholar

Daland, Robert, Hayes, Bruce, White, James, Garellek, Marc, Davis, Andrea & Norrmann, Ingrid (2011). Explaining sonority projection effects. Phonology 28. 197–234.CrossRef Google Scholar

Davidson, Lisa & Shaw, Jason A. (2012). Sources of illusion in consonant cluster perception. JPh 40. 234–248.Google Scholar

Della Pietra, Stephen, Della Pietra, Vincent J. & Lafferty, John D. (1997). Inducing features of random fields. IEEE Transactions on Pattern Analysis and Machine Intelligence 19. 380–393.Google Scholar

Edwards, Jan, Beckman, Mary E. & Munson, Benjamin (2004). The interaction between vocabulary size and phonotactic probability effects on children's production accuracy and fluency in nonword repetition. Journal of Speech, Language, and Hearing Research 47. 421–436.Google Scholar

Eisner, Jason (2002). Parameter estimation for probabilistic finite-state transducers. In Proceedings of the 40th Annual Meeting of the Association for Computational Linguistics. Association for Computational Linguistics. 1–8.Google Scholar

Elsner, Micha, Goldwater, Sharon, Feldman, Naomi & Wood, Frank (2013). A joint learning model of word segmentation, lexical acquisition, and phonetic variability. In Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing. 42–54.Google Scholar

Goldrick, Matthew & Daland, Robert (2009). Linking speech errors and phonological grammars: insights from Harmonic Grammar networks. Phonology 26. 147–185.Google Scholar

Goldwater, Sharon & Johnson, Mark (2003). Learning OT constraint rankings using a Maximum Entropy model. In Spenador, Jennifer, Eriksson, Anders & Dahl, Östen (eds.) Proceedings of the Stockholm Workshop on Variation within Optimality Theory. Stockholm: Stockholm University. 111–120.Google Scholar

Gouskova, Maria (2003). Deriving economy: syncope in Optimality Theory. PhD dissertation, University of Massachusetts, Amherst.Google Scholar

Grenander, Ulf (1976). Pattern synthesis. New York: Springer.Google Scholar

Harris, Theodore E. (1963). The theory of branching processes. Berlin: Springer.CrossRef Google Scholar

Hay, Jennifer, Pierrehumbert, Janet B. & Beckman, Mary E. (2003). Speech perception, well-formedness and the statistics of the lexicon. In Local, John, Ogden, Richard & Temple, Rosalind (eds.) Phonetic interpretation: papers in laboratory phonology VI . Cambridge: Cambridge University Press. 58–74.Google Scholar

Hayes, Bruce (2004). Phonological acquisition in Optimality Theory: the early stages. In Kager, René, Pater, Joe & Zonneveld, Wim (eds.) Constraints in phonological acquisition. Cambridge: Cambridge University Press. 158–203.Google Scholar

Hayes, Bruce (2011). Interpreting sonority-projection experiments: the role of phonotactic modeling. In Lee, Wai-Sum & Zee, Eric (eds.) Proceedings of the 17th International Congress of Phonetic Sciences, Hong Kong 2011. Hong Kong: University of Hong Kong. 835–838.Google Scholar

Hayes, Bruce & White, James (2013). Phonological naturalness and phonotactic learning. LI 44. 45–75.Google Scholar

Hayes, Bruce & Wilson, Colin (2008). A maximum entropy model of phonotactics and phonotactic learning. LI 39. 379–440.Google Scholar

Jäger, Gerhard (2007). Maximum entropy models and Stochastic Optimality Theory. In Zaenen, Annie, Simpson, Jane, King, Tracy Holloway, Grimshaw, Jane, Maling, Joan & Manning, Chris (eds.) Architectures, rules, and preferences: variations on themes by Joan W. Bresnan. Stanford: CSLI. 467–479.Google Scholar

Jarosz, Gaja (2013). Learning with hidden structure in Optimality Theory and Harmonic Grammar: beyond Robust Interpretive Parsing. Phonology 30. 27–71.Google Scholar

Jaynes, E. T. (1983). Papers on probability, statistics, and statistical physics. Edited by Rosenkrantz, R. D.. Dordrecht: Kluwer.Google Scholar

Jelinek, Frederick (1997). Statistical methods for speech recognition. Cambridge, Mass.: MIT Press.Google Scholar

Legendre, Géraldine, Miyata, Yoshiro & Smolensky, Paul (1990). Harmonic Grammar: a formal multi-level connectionist theory of linguistic well-formedness: an application. In Proceedings of the 12th Annual Conference of the Cognitive Science Society. Hillsdale: Erlbaum. 884–891.Google Scholar

McCarthy, John J. & Pater, Joe (eds.) (to appear). Harmonic Grammar and Harmonic Serialism. London: Equinox.Google Scholar

McCarthy, John J. & Prince, Alan (1993). Prosodic morphology I: constraint interaction and satisfaction. Ms, University of Massachusetts, Amherst & Rutgers University.Google Scholar

McClelland, James L. & Elman, Jeffrey L. (1986). The TRACE model of speech perception. Cognitive Psychology 18. 1–86.Google Scholar

Magri, Giorgio (2012). Convergence of error-driven ranking algorithms. Phonology 29. 213–269.Google Scholar

Manning, Christopher D. & Schütze, Hinrich (1999). Foundations of statistical natural language processing. Cambridge, Mass: MIT Press.Google Scholar

Mattys, Sven L. & Jusczyk, Peter W. (2001). Do infants segments words or recurring contiguous patterns? Journal of Experimental Psychology: Human Perception and Performance 27. 644–645.Google Scholar

Merchant, Nazarré & Tesar, Bruce (2008). Learning underlying forms by searching restricted lexical subspaces. CLS 41:2. 33–47.Google Scholar

Norris, Dennis & McQueen, James M. (2008). Shortlist B: a Bayesian model of continuous speech recognition. Psychological Review 115. 357–395.Google Scholar

Pater, Joe (2008). Gradual learning and convergence. LI 39. 334–345.Google Scholar

Pater, Joe (to appear). Universal Grammar with weighted constraints. In McCarthy & Pater (to appear).Google Scholar

Prince, Alan & Smolensky, Paul (1993). Optimality Theory: constraint interaction in generative grammar. Ms, Rutgers University & University of Colorado, Boulder. Published 2004, Malden, Mass. & Oxford: Blackwell.Google Scholar

Riggle, Jason (2004). Generation, recognition, and learning in finite-state Optimality Theory. PhD dissertation, University of California, Los Angeles.Google Scholar

Riggle, Jason (2009). Violation semirings in Optimality Theory. Research on Language and Computation 7. 1–12.Google Scholar

Scharenborg, Odette, Norris, Dennis, Bosch, Louis ten & McQueen, James M. (2005). How should a speech recognizer work? Cognitive Science 29. 867–918.Google Scholar

Smolensky, Paul & Legendre, Géraldine (eds.) (2006). The harmonic mind: from neural computation to optimality-theoretic grammar. 2 vols. Cambridge, Mass.: MIT Press.Google Scholar

Storkel, Holly L., Armbrüster, Jonna & Hogan, Tiffany P. (2006). Differentiating phonotactic probability and neighborhood density in adult word learning. Journal of Speech, Language, and Hearing Research 49. 1175–1192.Google Scholar

Tesar, Bruce, Alderete, John, Horwood, Graham, Merchant, Nazarré, Nishitani, Koichi & Prince, Alan (2003). Surgery in language learning. WCCFL 22. 477–490.Google Scholar

Tesar, Bruce & Prince, Alan (2003). Using phonotactics to learn phonological alternations. CLS 39:2. 209–237.Google Scholar

Tesar, Bruce & Smolensky, Paul (1998). Learnability in Optimality Theory. LI 29. 229–268.Google Scholar

Wilson, Colin & Davidson, Lisa (2013). Bayesian analysis of non-native cluster production. NELS 40. 265–278.Google Scholar

Wilson, Colin, Davidson, Lisa & Martin, Sean (2014). Effects of acoustic–phonetic detail on cross-language speech production. Journal of Memory and Language 77. 1–24.Google Scholar

Wilson, Colin & Obdeyn, Marieke (2009). Simplifying subsidiary theory: statistical evidence from Arabic, Muna, Shona, and Wargamay. Ms, Johns Hopkins University.Google Scholar

Daland supplementary material

Daland supplementary material 1

PDF 2.3 MB

Article contents

Long words in maximum entropy phonotactic grammars*

Abstract

Access options

Footnotes

References

REFERENCES

Daland supplementary material

Save article to Kindle

Save article to Dropbox

Save article to Google Drive

Reply to: Submit a response

Your details

You have entered the maximum number of contributors

Conflicting interests