Computational Construction Grammar: A Usage-Based Approach

Jonathan Dunn

doi:10.1017/9781009233743

Series: Elements in Cognitive Linguistics

Computational Construction Grammar

A Usage-Based Approach

Published online by Cambridge University Press: 08 May 2024

Jonathan Dunn

Show author details

Jonathan Dunn: Affiliation:
University of Illinois, Urbana-Champaign

Summary

This Element introduces a usage-based computational approach to Construction Grammar that draws on techniques from natural language processing and unsupervised machine learning. This work explores how to represent constructions, how to learn constructions from a corpus, and how to arrange the constructions in a grammar as a network. From a theoretical perspective, this Element examines how construction grammars emerge from usage alone as complex systems, with slot-constraints learned at the same time that constructions are learned. From a practical perspective, this work is accompanied by a Python package which enables linguists to incorporate construction grammars into their own corpus-based work. The computational experiments in this Element are important for testing the learnability, variability, and confirmability of Construction Grammar as a theory of language. All code examples will leverage the cloud computing platform Code Ocean to guide readers through implementation of these algorithms.

Element contents

Summary
References

Get access

Keywords

Computational syntax usage-based grammar Construction Grammar cognitive grammar cognitive linguistics

Type: Element
Information: Series: Elements in Cognitive Linguistics

DOI: https://doi.org/10.1017/9781009233743 [Opens in a new window]

Online ISBN: 9781009233743

Publisher: Cambridge University Press

Print publication: 06 June 2024

Access options

Get access to the full version of this content by using one of the access options below. (Log in options will check for institutional or personal access. Content may require purchase if you do not have access.)

References

Bates, E., & Goodman, J. (1997). On the inseparability of grammar and the lexicon: Evidence from acquisition, aphasia and real-time processing. Language and Cognitive Processes, 12(5–6), 507–584. https://doi.org/10.1080/016909697386628.Google Scholar

Bengio, Y., Ducharme, R., Vincent, P., & Jauvin, C. (2003). A neural probabilistic language model. Journal of Machine Learning Research, 3, 1137–1155.Google Scholar

Beuls, K., & Van Eecke, P. (2023). Fluid construction grammar: State of the art and future outlook. In Proceedings of the First International Workshop on Construction Grammars and NLP (CxGs+NLP, GURT/SyntaxFest 2023) (pp. 41–50). Washington, DC Association for Computational Linguistics. https://aclanthology.org/2023.cxgsnlp-1.6.Google Scholar

Biber, D., & Conrad, S. (2009). Register, genre, and style. Cambridge University Press.CrossRef Google Scholar

Bouma, G. (2009). Normalized (pointwise) mutual information in collocation extraction. In Proceedings of the German Society for Computational Linguistics and Language Technology (Vol. 30, pp. 31–40). Gunter Narr Verlag.Google Scholar

Brysbaert, M., Warriner, A., & Kuperman, V. (2014). Concreteness ratings for 40 thousand generally known English word lemmas. Behavior Research Methods, 46, 904–911. https://doi.org/10.3758/s13428-013-0403-5.CrossRef Google Scholar PubMed

Burdick, L., Kummerfeld, J. K., & Mihalcea, R. (2021). Analyzing the surprising variability in word embedding stability across languages. In Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing (pp. 5891–5901). Association for Computational Linguistics.CrossRef Google Scholar

Chen, S., & Goodman, J. (1999). An empirical study of smoothing techniques for language modeling. Computer Speech and Language, 13, 359–394.CrossRef Google Scholar

Church, K., & Hanks, P. (1990). Word association norms, mutual information, and lexicography. Computational Linguistics, 16(1), 22–29. https://doi.org/10.3115/981623.981633.Google Scholar

Devlin, J., Chang, M.- W., Lee, K., & Toutanova, K. (2019). BERT: Pre-training of deep bidirectional transformers for language understanding. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, volume 1 (long and short papers) (pp. 4171–4186). Association for Computational Linguistics. https://doi.org/10.18653/v1/N19-1423.Google Scholar

Doumen, J., Beuls, K., & Van Eecke, P. (2023). Modelling language acquisition through syntactico-semantic pattern finding. In Findings of the Association for Computational Linguistics: EACL 2023 (pp. 1347–1357). Association for Computational Linguistics.CrossRef Google Scholar

Dunn, J. (2010). Gradient semantic intuitions of metaphoric expressions. Metaphor and Symbol, 26(1), 53–67. https://doi.org/10.1080/10926488.2011.535416.CrossRef Google Scholar

Dunn, J. (2013). How linguistic structure influences and helps to predict metaphoric meaning. Cognitive Linguistics, 24(1), 33–66. https://doi.org/10.1515/cog-2013-0002.CrossRef Google Scholar

Dunn, J. (2017). Computational learning of construction grammars. Language & Cognition, 9(2), 254–292.CrossRef Google Scholar

Dunn, J. (2018a). Finding variants for construction-based dialectometry: A corpus-based approach to regional CxGs. Cognitive Linguistics, 29(2), 275–311.CrossRef Google Scholar

Dunn, J. (2018b). Modeling the complexity and descriptive adequacy of construction grammars. In Proceedings of the Society for Computation in Linguistics (pp. 81–90). Association for Computational Linguistics.Google Scholar

Dunn, J. (2018c). Multi-unit directional measures of association moving beyond pairs of words. International Journal of Corpus Linguistics, 23(2), 183–215.CrossRef Google Scholar

Dunn, J. (2019a). Frequency vs. association for constraint selection in usage-based construction grammar. In Proceedings of the Workshop on Cognitive Modeling and Computational Linguistics (pp. 117–128). Association for Computational Linguistics.CrossRef Google Scholar

Dunn, J. (2019b). Global syntactic variation in seven languages: Toward a computational dialectology. Frontiers in Artificial Intelligence, 2(15). https://doi.org/10.3389/frai.2019.00015.CrossRef Google Scholar

Dunn, J. (2019c). Modeling global syntactic variation in English using dialect classification. In Proceedings of the Sixth Workshop on NLP for Similar Languages, Varieties and Dialects (pp. 42–53). Association for Computational Linguistics. https://doi.org/10.18653/v1/W19-1405.CrossRef Google Scholar

Dunn, J. (2020). Mapping languages: The Corpus of Global Language Use. Language Resources and Evaluation, 54, 999–1018. https://doi.org/10.1007/s10579-020-09489-2.CrossRef Google Scholar

Dunn, J. (2022a). Exposure and emergence in usage-based grammar: Computational experiments in 35 languages. Cognitive Linguistics, 33(4), 659–699.CrossRef Google Scholar

Dunn, J. (2022b). Natural language processing for corpus linguistics. Cambridge University Press.CrossRef Google Scholar

Dunn, J. (2023a). Syntactic variation across the grammar: Modelling a complex adaptive system. Frontiers in Complex Systems, 1. https://doi.org/10.3389/fcpxs.2023.1273741.CrossRef Google Scholar

Dunn, J. (2023b). Variation and instability in dialect-based embedding spaces. In Tenth Workshop on NLP for Similar Languages, Varieties and Dialects (VarDial 2023) (pp. 67–77). Dubrovnik, Croatia. Association for Computational Linguistics. https://doi.org/10.18653/v1/2023.vardial-1.7.CrossRef Google Scholar

Dunn, J., Li, H., & Sastre, D. (2022). Predicting embedding reliability in low-resource settings using corpus similarity measures. In Proceedings of the Thirteenth Language Resources and Evaluation Conference (pp. 6461–6470). Marseille, France. European Language Resources Association. https://aclanthology.org/2022.lrec-1.693.Google Scholar

Dunn, J., & Nini, A. (2021). Production vs perception: The role of individuality in usage-based grammar induction. In Proceedings of the Workshop on Cognitive Modeling and Computational Linguistics (pp. 149–159). Association for Computational Linguistics.CrossRef Google Scholar

Dunn, J., & Tayyar Madabushi, H. (2021). Learned construction grammars converge across registers given increased exposure. In Conference on Natural Language Learning (pp. 268–278). Association for Computational Linguistics.CrossRef Google Scholar

Dunn, J., & Wong, S. (2022). Stability of syntactic dialect classification over space and time. In Proceedings of the 29th International Conference on Computational Linguistics (pp. 26–36). Gyeongju, Republic of Korea. International Committee on Computational Linguistics. https://aclanthology.org/2022.lrec-1.693.Google Scholar

Ellis, N. (2007). Language acquisition as rational contingency learning. Applied Linguistics, 27(1), 1–24.CrossRef Google Scholar

Fodor, J. D., & Crowther, C. (2002). Understanding stimulus poverty arguments. The Linguistic Review, 19(1–2), 105–145. https://doi.org/10.1515/tlir.19.1-2.105.Google Scholar

Gazdar, G., Klein, E. H., Pullum, G. K., & Sag, I. A. (1985). Generalized phrase structure grammar. Blackwell.Google Scholar

Goldberg, A. (1995). Constructions: A construction grammar approach to argument structure. Chicago University Press.Google Scholar

Goldberg, A. (2006). Constructions at work: The nature of generalization in language. Oxford University Press.Google Scholar

Goldberg, A. (2019). Explain me this: Creativity, competition, and the partial productivity of constructions. Princeton University Press.Google Scholar

Goldsmith, J. (2001). Unsupervised learning of the morphology of a natural language. Computational Linguistics, 27(2), 153–198.CrossRef Google Scholar

Goldsmith, J. (2006). An algorithm for the unsupervised learning of morphology. Natural Language Engineering, 12(4), 353–371.CrossRef Google Scholar

Goldsmith, J. (2015). Towards a new empiricism for linguistics. In Chater, N., Clark, A., Goldsmith, J., & Perfors, A. (Eds.), Empiricism and language learnability (pp. 58–105). Oxford University Press. https://doi.org/10.1093/acprof:oso/9780198734260.003.0003.Google Scholar

Grave, E., Bojanowski, P., Gupta, P., Joulin, A., & Mikolov, T. (2018). Learning word vectors for 157 languages. In Proceedings of the International Conference on Language Resources and Evaluation (pp. 3483–3487). European Language Resources Association.Google Scholar

Grune, D., & Jacobs, C. J. H. (2008). Parsing techniques: A practical guide (2nd ed.). Springer.CrossRef Google Scholar

Grünwald, P. (2007). The minimum description length principle. MIT Press.CrossRef Google Scholar

Hellrich, J., Kampe, B., & Hahn, U. (2019). The influence of down-sampling strategies on SVD word embedding stability. In Proceedings of the 3rd Workshop on Evaluating Vector Space Representations for NLP (pp. 18–26). Association for Computational Linguistics.CrossRef Google Scholar

Kesarwani, A. (2018). New York Times comments. Kaggle. www.kaggle.com/datasets/aashita/nyt-comments.Google Scholar

Kneser, R., & Ney, H. (1995). Improved backing-off for M-gram language modeling. In Proceedings of the International Conference on Acoustics, Speech, and Signal Processing (Vol. 1, pp. 181–184). IEEE. https://doi.org/10.1109/ICASSP.1995.479394.Google Scholar

Kohonen, O., Virpioja, S., & Lagus, K. (2010). Semi-supervised learning of concatenative morphology. In Proceedings of the ACL Special Interest Group on Computational Morphology and Phonology (pp. 78–86). Association for Computational Linguistics.Google Scholar

Kuperman, V., Stadthagen-Gonzalez, H., & Brysbaert, M. (2012). Age-of-acquisition ratings for 30,000 English words. Behavior Research Methods, 44, 978–990.CrossRef Google Scholar PubMed

Lakoff, G., & Johnson, M. (1999). Philosophy in the flesh: The embodied mind and its challenge to western thought. Basic Books.Google Scholar

Langacker, R. (2008). Cognitive grammar: A basic introduction. Oxford University Press.CrossRef Google Scholar

Leclercq, L., & Morin, C. (2023). No equivalence: A new principle of no synonymy. Constructions, 15(1). https://doi.org/10.24338/cons-535.Google Scholar

Levy, O., Goldberg, Y., & Dagan, I. (2015). Improving distributional similarity with lessons learned from word embeddings. Transactions of the Association for Computational Linguistics, 3, 211–225. https://doi.org/10.1162/tacl_a_00134.CrossRef Google Scholar

Li, H., & Dunn, J. (2022). Corpus similarity measures remain robust across diverse languages. Lingua, 275, 103377.CrossRef Google Scholar

Li, H., Dunn, J., & Nini, A. (2022). Register variation remains stable across 60 languages. Corpus Linguistics and Linguistic Theory, 19(3), 397–426.CrossRef Google Scholar

Linzen, T. (2016). Issues in evaluating semantic spaces using word analogies. In Proceedings of the 1st Workshop on Evaluating Vector-Space Representations for NLP (pp. 13–18). Berlin, Germany: Association for Computational Linguistics. https://doi.org/10.18653/v1/W16-2503.CrossRef Google Scholar

Lison, P., & Tiedemann, J. (2016). OpenSubtitles2016: Extracting large parallel corpora from movie and TV subtitles. In Proceedings of the Tenth International Conference on Language Resources and Evaluation (LREC’16) (pp. 923–929). European Language Resources Association (ELRA).Google Scholar

Mikolov, T., Chen, K., Corrado, G., & Dean, J. (2013). Efficient estimation of word representations in vector space. arXiv. https://doi.org/10.48550/ARXIV.1301.3781.CrossRef Google Scholar

Mikolov, T., Sutskever, I., Chen, K., Corrado, G., & Dean, J. (2013). Distributed representations of words and phrases and their compositionality. In Proceedings of the 26th International Conference on Neural Information Processing Systems – Volume 2 (pp. 3111–3119). Curran Associates Inc.Google Scholar

Miller, G. A. (1956). The magical number seven, plus or minus two: Some limits on our capacity for processing information. Psychological Review, 63(2), 81–97.CrossRef Google Scholar PubMed

Nevens, J., Doumen, J., Van Eecke, P., & Beuls, K. (2022). Language acquisition through intention reading and pattern finding. In Proceedings of the 29th International Conference on Computational Linguistics (pp. 15–25). International Committee on Computational Linguistics.Google Scholar

Nirenburg, S., & Raskin, V. (2004). Ontological semantics. MIT Press.Google Scholar

Ortman, M. (2018). Wikipedia sentences. Kaggle. https://www.kaggle.com/datasets/mikeortman/wikipedia-sentences.Google Scholar

Perek, F., & Patten, A. L. (2019). Towards an English constructicon using patterns and frames. International Journal of Corpus Linguistics, 24(3), 354–384. https://doi.org/10.1075/ijcl.00016.per.CrossRef Google Scholar

Piao, S., Bianchi, F., Dayrell, C., D’egidio, A., & Rayson, P. (2015). Development of the multilingual semantic annotation system. In Proceedings of the 2015 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (pp. 1268–1274). Association for Computational Linguistics.Google Scholar

Rae, J. W., Potapenko, A., Jayakumar, S. M., & Lillicrap, T. P. (2019). Compressive transformers for long-range sequence modelling. arXiv. https://doi.org/10.48550/ARXIV.1911.05507.CrossRef Google Scholar

Rousseeuw, P. (1987). Silhouettes: A graphical aid to the interpretation and validation of cluster analysis. Computational and Applied Mathematics, 20, 53–65.CrossRef Google Scholar

Schler, J., Koppel, M., Argamon, S., & Pennebaker, J. (2006). Effects of age and gender on blogging. In Proceedings of 2006 AAAI Spring Symposium on Computational Approaches for Analyzing Weblogs. Association for the Advancement of Artificial Intelligence.Google Scholar

Schubert, E., & Lenssen, L. (2022). Fast k-medoids clustering in Rust and Python. Journal of Open Source Software, 7(75), 4183.CrossRef Google Scholar

Sullivan, K. (2013). Frames and constructions in metaphoric language. John Benjamins.CrossRef Google Scholar

Taylor, J. (2004). Linguistic categorization (3rd ed.). Oxford University Press.Google Scholar

Tiedemann, J. (2012). Parallel data, tools and interfaces in OPUS. In Proceedings of the Eighth International Conference on Language Resources and Evaluation (LREC’12) (pp. 2214–2218). European Language Resources Association (ELRA).Google Scholar

Vlach, H. (2019). Learning to remember words: Memory constraints as double-edged sword mechanisms of language development. Child Development Perspectives, 13, 159–165. https://doi.org/10.1111/cdep.12337.CrossRef Google Scholar

Vlach, H., & DeBrock, C. A. (2019). Statistics learned are statistics forgotten: Children’s retention and retrieval of cross-situational word learning. Journal of Experimental Psychology: Learning, Memory, and Cognition, 45, 700–711. https://doi.org/10.1037/xlm0000611.Google Scholar PubMed

Wible, D., & Tsao, N. (2010). StringNet as a computational resource for discovering and investigating linguistic constructions. In Proceedings of the Workshop on Extracting and Using Constructions in Computational Linguistics (pp. 25–31). Association for Computational Linguistics.Google Scholar

Wible, D., & Tsao, N.- L. (2020). Constructions and the problem of discovery: A case for the paradigmatic. Corpus Linguistics and Linguistic Theory, 16(1), 67–93. https://doi.org/10.1515/cllt-2017-0008.CrossRef Google Scholar

Zhang, X., Zhao, J., & LeCun, Y. (2015). Character-level convolutional networks for text classification. arXiv. https://doi.org/10.48550/ARXIV.1509.01626.CrossRef Google Scholar

Element contents

Computational Construction Grammar

Summary

Keywords

Access options

References

Save element to Kindle

Save element to Dropbox

Save element to Google Drive