Skip to main content Accessibility help
Hostname: page-component-5c569c448b-hlvcg Total loading time: 0.237 Render date: 2022-07-02T09:16:05.792Z Has data issue: true Feature Flags: { "shouldUseShareProductTool": true, "shouldUseHypothesis": true, "isUnsiloEnabled": true, "useRatesEcommerce": false, "useNewApi": true } hasContentIssue true

An empirical generative framework for computational modeling of language acquisition*

Published online by Cambridge University Press:  26 April 2010

Heidi R. Waterfall
Department of Psychology, Cornell University and Department of Psychology, University of Chicago
Ben Sandbank
School of Computer Science, Tel Aviv University
Luca Onnis
Department of Second Language Studies, University of Hawaii
Shimon Edelman
Department of Psychology, Cornell University and Department of Brain and Cognitive Engineering, Korea University


This paper reports progress in developing a computer model of language acquisition in the form of (1) a generative grammar that is (2) algorithmically learnable from realistic corpus data, (3) viable in its large-scale quantitative performance and (4) psychologically real. First, we describe new algorithmic methods for unsupervised learning of generative grammars from raw CHILDES data and give an account of the generative performance of the acquired grammars. Next, we summarize findings from recent longitudinal and experimental work that suggests how certain statistically prominent structural properties of child-directed speech may facilitate language acquisition. We then present a series of new analyses of CHILDES data indicating that the desired properties are indeed present in realistic child-directed speech corpora. Finally, we suggest how our computational results, behavioral findings, and corpus-based insights can be integrated into a next-generation model aimed at meeting the four requirements of our modeling framework.

Copyright © Cambridge University Press 2010

Access options

Get access to the full version of this content by using one of the access options below. (Log in options will check for institutional or personal access. Content may require purchase if you do not have access.)


Adriaans, P. (2001). Learning shallow context free languages under simple distributions. In Copestake, A. & Vermeulen, K. (eds), Algebras, diagrams and decisions in language, logic and computation, 135/ Stanford, CA: CSLI/CUP.Google Scholar
Baayen, R. (2006). Analyzing linguistic data: A practical introduction to statistics using R. Cambridge: Cambridge University Press.Google Scholar
Baker, L. & McCallum, A. (1998). Distributional clustering of words for text classification. In SIGIR-98, 21st ACM International Conference on Research and Development in Information Retrieval, Melbourne, 96–103. New York: ACM Press.Google Scholar
Batchelder, E. (2002). Bootstrapping the lexicon: A computational model of infant speech segmentation. Cognition 83, 167206.CrossRefGoogle ScholarPubMed
Bloom, L. (1970). Language development: Form and function in emerging grammars. Cambridge, MA: MIT Press.Google Scholar
Bloom, L. (1973). One word at a time: The use of single-word utterances before syntax. The Hague: Mouton.Google Scholar
Bod, R. (2009). Constructions at work or at rest. Cognitive Linguistics 20, 129–34.CrossRefGoogle Scholar
Brodsky, P., Waterfall, H. & Edelman, S. (2007). Characterizing motherese: On the computational structure of child directed language. In McNamara, D. & Trafton, J. (eds), Proceedings of the 29th Cognitive Science Society Conference, Nashville, 833–38. Austin, TX: Cognitive Science Society.Google Scholar
Brown, R. (1973). A first language: The early stages. Cambridge, MA: Harvard University Press.CrossRefGoogle Scholar
Brown, R., Cazden, C. & Bellugi, U. (1969). The child's grammar from one to three. In Hill, J. P. (ed.), Minnesota symposium on child development, Volume 2, 2873. Minneapolis, MI: University of Minnesota Press.Google Scholar
Charniak, E. (1997). Statistical techniques for natural language parsing. AI Magazine 18, 3344.Google Scholar
Chater, N. & Vitányi, P. (2007). Ideal learning of natural language: Positive results about learning from positive evidence. Journal of Mathematical Psychology 51, 135–63.CrossRefGoogle Scholar
Chomsky, N. (1957). Syntactic structures. The Hague: Mouton.Google Scholar
Chomsky, N. (1995). The Minimalist Program. Cambridge, MA: MIT Press.Google Scholar
Christiansen, M. & Chater, N. (2001). Connectionist psycholinguistics: Capturing the empirical data. Trends in Cognitive Sciences 5, 8288.CrossRefGoogle ScholarPubMed
Clark, A. (2001). Unsupervised language acquisition: Theory and practice. Unpublished PhD thesis, School of Cognitive and Computing Sciences, University of Sussex.Google Scholar
Clark, A. (2006). PAC learning unambiguous NTS languages. In Proceedings of ICGI, 5971. Tokyo: Springer-Verlag.Google Scholar
Demuth, K., Culbertson, J. & Alter, J. (2006). Word-minimality, epenthesis, and coda licensing in the acquisition of English. Language & Speech 49, 137–74.CrossRefGoogle ScholarPubMed
Edelman, S. (2010). On look-ahead in language: Navigating a multitude of familiar paths. In Bar, M. (ed.), Prediction in the brain (to appear). New York: Oxford University Press.Google Scholar
Edelman, S. & Waterfall, H. (2007). Behavioral and computational aspects of language and its acquisition. Physics of Life Reviews 4, 253–77.CrossRefGoogle Scholar
Elman, J., Bates, E., Johnson, M., Karmiloff-Smith, A., Parisi, D. & Plunkett, K. (1996). Rethinking innateness: A connectionist perspective on development. Cambridge, MA: MIT Press.Google Scholar
Goldsmith, J. (2007). Towards a new empiricism. In de Carvalho, J. B. (ed.), Recherches linguistiques à Vincennes, Volume 36.Google Scholar
Goodman, J. (2001). A bit of progress in language modeling. Computer Speech and Language 15, 403434.CrossRefGoogle Scholar
Hall, W., Nagy, W. & Linn, R. (1984). Spoken words: Effects of situation and social group on oral word usage and frequency. Hillsdale, NJ: Erlbaum.Google Scholar
Hall, W., Nagy, W. & Nottenburg, G. (1981). Situational variation in the use of internal state words. Champaign, IL: University of Illinois.Google Scholar
Hall, W. & Tirre, W. (1979). The communicative environment of young children: Social class, ethnic and situational differences. Champaign, IL: University of Illinois.Google Scholar
Harris, Z. (1954). Distributional structure. Word 10, 140–62.CrossRefGoogle Scholar
Higginson, R. (1985). Fixing-assimilation in language acquisition. Unpublished doctoral dissertation, Washington State University.Google Scholar
Hoff-Ginsberg, E. (1985). Some contributions of mothers' speech to their children's syntactic growth. Journal of Child Language 12, 367–85.CrossRefGoogle ScholarPubMed
Hoff-Ginsberg, E. (1990). Maternal speech and the child's development of syntax: A further look. Journal of Child Language 17, 8599.CrossRefGoogle ScholarPubMed
Joshi, A. & Schabes, Y. (1997). Tree-adjoining grammars. In Rozenberg, G. and Salomaa, A. (eds), Handbook of formal languages, 3, 69–124. Berlin: Springer.CrossRefGoogle Scholar
Klein, D. & Manning, C. (2001). Distributional phrase structure induction. In Daelemans, W. & Zajac, R. (eds), Proceedings of the Conference on Natural Language Learning (CoNLL) 2001, 113–20. Toulouse: ACL.Google Scholar
Klein, D. & Manning, C. (2002). Natural language grammar induction using a constituent-context model. In Dietterich, T. G., Becker, S. & Ghahramani, Z. (eds), Advances in neural information processing systems 14, 3542. Cambridge, MA: MIT Press.Google Scholar
Küntay, A. & Slobin, D. (1996). Listening to a Turkish mother: Some puzzles for acquisition. In Slobin, D., Gerhardt, J. (eds), Social interaction, social context, and language: Essays in honor of Susan Ervin-Tripp, 265–86. Hillsdale, NJ: Lawrence Erlbaum Associates.Google Scholar
Lee, L. (1999). Measures of distributional similarity. In Proceedings of the 37th ACL, 2532, College Park, MD: ACL.Google Scholar
Lieven, E., Pine, J. & Baldwin, G. (1997). Lexically-based learning and early grammatical development. Journal of Child Language 24, 187219.CrossRefGoogle ScholarPubMed
MacWhinney, B. (1995). The CHILDES Project: Tools for analyzing talk. Hillsdale, NJ: Lawrence Erlbaum Associates.Google Scholar
MacWhinney, B. (2000). The CHILDES Project: Tools for analyzing talk. Volume 1: Transcription format and programs. Volume 2: The Database. Mahwah, NJ: Erlbaum.Google Scholar
MacWhinney, B. & Snow, C. (1985). The Child Language Exchange System. Journal of Computational Linguistics 12, 271–96.Google Scholar
Nelson, K. (1977). Facilitating children's syntax acquisition. Developmental Psychology 13, 101107.CrossRefGoogle Scholar
Onnis, L., Waterfall, H. & Edelman, S. (2008). Learn locally, act globally: Learning language from variation set cues. Cognition 109, 423–30.CrossRefGoogle ScholarPubMed
Pereira, F., Tishby, N. & Lee, L. (1993). Distributional clustering of English words. In Meeting of the Association for Computational Linguistics (ACL), 183–90. ACL.Google Scholar
Pickering, M. & Garrod, S. (2004). Toward a mechanistic psychology of dialogue. Behavioral and Brain Sciences 27, 169225.CrossRefGoogle Scholar
Redington, M., Chater, N. & Finch, S. (1998). Distributional information: A powerful cue for acquiring syntactic categories. Cognitive Science 22, 425–69.CrossRefGoogle Scholar
Ristad, E. & Yianilos, P. (1998). Learning string edit distance. IEEE Transactions on Pattern Analysis and Machine Intelligence 20, 522–32.CrossRefGoogle Scholar
Sachs, J. (1983). Talking about the there and then: The emergence of displaced reference in parent–child discourse. In Nelson, K. E. (ed.), Children's language, Vol. 4, 128. Hillsdale, NJ: Lawrence Erlbaum Associates.Google Scholar
Schütze, C. (1996). The empirical base of linguistics: Grammaticality judgments and linguistic methodology. Chicago, IL: University of Chicago Press.Google Scholar
Sokolov, J. (1993). A local contingency analysis of the fine-tuning hypothesis. Developmental Psychology 29, 10081023.CrossRefGoogle Scholar
Sokolov, J. & MacWhinney, B. (1990). The CHIP framework: Automatic coding and analysis of parent–child conversational interaction. Behavior Research Methods, Instruments & Computers 2, 151–61.CrossRefGoogle Scholar
Solan, Z., Horn, D., Ruppin, E. & Edelman, S. (2005). Unsupervised learning of natural languages. Proceedings of the National Academy of Science 102, 1162911634.CrossRefGoogle ScholarPubMed
Stine, E. & IIIBohannon, J. (1983). Imitations, interactions, and language acquisition. Journal of Child Language 10, 589603.CrossRefGoogle Scholar
Stolcke, A. & Omohundro, S. (1994). Inducing probabilistic grammars by Bayesian model merging. In Carrasco, R. C. & Oncina, J. (eds), Grammatical inference and applications, 106118. Berlin: Springer.CrossRefGoogle Scholar
Suppes, P. (1974). The semantics of children's language. American Psychologist 29, 103114.CrossRefGoogle Scholar
Szmrecsanyi, B. (2005). Language users as creatures of habit: A corpus based analysis of persistence in spoken English. Corpus Linguistics and Linguistic Theory 1, 113–49.CrossRefGoogle Scholar
Valiant, L. (1984). A theory of the learnable. Communications of the ACM 27, 11341142.CrossRefGoogle Scholar
Waterfall, H. (2006). A little change is a good thing: Feature theory, language acquisition and variation sets. Unpublished doctoral dissertation, University of Chicago.Google Scholar
Waterfall, H. (submitted). Relation of variation sets to noun and verb development. Manuscript submitted for publication.Google Scholar
Wolff, J. (1988). Learning syntax and meanings through optimization and distributional analysis. In Levy, Y., Schlesinger, I. M. & Braine, M. D. S. (eds), Categories and processes in language acquisition, 179215. Hillsdale, NJ: Lawrence Erlbaum.Google Scholar
Cited by

Save article to Kindle

To save this article to your Kindle, first ensure is added to your Approved Personal Document E-mail List under your Personal Document Settings on the Manage Your Content and Devices page of your Amazon account. Then enter the ‘name’ part of your Kindle email address below. Find out more about saving to your Kindle.

Note you can select to save to either the or variations. ‘’ emails are free but can only be saved to your device when it is connected to wi-fi. ‘’ emails can be delivered even when you are not connected to wi-fi, but note that service fees apply.

Find out more about the Kindle Personal Document Service.

An empirical generative framework for computational modeling of language acquisition*
Available formats

Save article to Dropbox

To save this article to your Dropbox account, please select one or more formats and confirm that you agree to abide by our usage policies. If this is the first time you used this feature, you will be asked to authorise Cambridge Core to connect with your Dropbox account. Find out more about saving content to Dropbox.

An empirical generative framework for computational modeling of language acquisition*
Available formats

Save article to Google Drive

To save this article to your Google Drive account, please select one or more formats and confirm that you agree to abide by our usage policies. If this is the first time you used this feature, you will be asked to authorise Cambridge Core to connect with your Google Drive account. Find out more about saving content to Google Drive.

An empirical generative framework for computational modeling of language acquisition*
Available formats

Reply to: Submit a response

Please enter your response.

Your details

Please enter a valid email address.

Conflicting interests

Do you have any conflicting interests? *