Hostname: page-component-cc8bf7c57-77pjf Total loading time: 0 Render date: 2024-12-11T22:24:25.902Z Has data issue: false hasContentIssue false

Recent Contributions of Data Mining to Language Learning Research

Published online by Cambridge University Press:  23 July 2019

Mark Warschauer
Affiliation:
University of California, Irvine
Soobin Yim*
Affiliation:
University of California, Irvine
Hansol Lee
Affiliation:
Korea Military Academy
Binbin Zheng
Affiliation:
Michigan State University
*
*Corresponding author. E-mail: soobiny@uci.edu

Abstract

This paper will review the role of data mining in research on second language learning. Following a general introduction to the topic, three areas of data mining research will be summarized—clustering techniques, text-mining, and social network analysis—with examples from both the broader field and studies conducted by the authors. The application of data mining in second language learning research is relatively new, and more theoretical and empirical support is needed in the appropriate collection, use, and interpretation of data for specific research and pedagogical objectives. The three examples that we introduce illustrate how new data sources accessible in online environments can be analyzed to better understand the optimal instructional context for corpus-based vocabulary learning (clustering technique), characteristics and patterns of collaborative written interaction using Google Docs (text mining and visualizations), and issues of access and community in computer-mediated discussion (social network analysis). Implications of these new techniques for L2 research will be discussed.

Type
Review Article
Copyright
Copyright © Cambridge University Press 2019 

Access options

Get access to the full version of this content by using one of the access options below. (Log in options will check for institutional or personal access. Content may require purchase if you do not have access.)

References

Attewell, P., Monaghan, D. B., & Kwong, D. (2015). Data mining for the social sciences: An introduction. Oakland, CA: University of California Press.Google Scholar
Behrens, J. T., & DiCerbo, K. E. (2014). Harnessing the currents of the digital ocean. In Larusson, J. A. & White, B. (Eds.), Learning analytics: From research to practice (pp. 3960). New York, NY: Springer.Google Scholar
Bergman, L. R., Magnusson, D., & El Khouri, B. M. (2003). Studying individual development in an interindividual context: A person-oriented approach. New York, NY: Psychology Press.Google Scholar
Biber, D., & Reppen, R. (Eds.). (2015). The Cambridge handbook of English corpus linguistics. Cambridge, UK: Cambridge University Press.Google Scholar
Borgatti, S. P., Everett, M. G., & Freeman, L. C. (2002). UCINET for Windows: Software for social network analysis. Cambridge, MA: Analytic Technologies.Google Scholar
Bronfenbrenner, U., & Morris, P. A. (2006). The bioecological model of human development. In Damon, W. & Lerner, R. M. (Eds.), Handbook of child psychology (pp. 793828). Hoboken, NJ: Wiley.Google Scholar
Calderon, O., & Sood, C. (2018). Evaluating learning outcomes of an asynchronous online discussion assignment: A post-priori content analysis. Interactive Learning Environments, 1–15. https://doi.org/10.1080/10494820.2018.1510421Google Scholar
Carico, K. M., & Logan, D. (2004). A generation in cyberspace: Engaging readers through online discussions. Language Arts, 81(4), 293302.Google Scholar
Carolan, B. (2014). Social network analysis and education. Thousand Oaks, CA: SAGE.Google Scholar
Chapelle, C. A. (2007). Technology and second language acquisition. Annual Review of Applied Linguistics, 27, 98114. https://doi.org/10.1017/S0267190508070050Google Scholar
Chapelle, C. A., Cotos, E., & Lee, J. (2015). Validity arguments for diagnostic assessment using automated writing evaluation. Language Testing, 32(3), 385405.Google Scholar
Charrad, M., Ghazzali, N., Boiteau, V., & Niknafs, A. (2014). NbClust: An R package for determining the relevant number of clusters in a data set. Journal of Statistical Software, 61(6), 136. http://www.jstatsoft.org/Google Scholar
Chin, A., & Chignell, M. (2007). Identifying communities in blogs: roles for social network analysis and survey instruments. International Journal of Web Based Communities, 3(3), 345363.Google Scholar
Chiu, Mi. M., & Fujita, N. (2014). Statistical Discourse Analysis: A method for modeling online discussion processes. Journal of Learning Analytics, 1(3), 6183.Google Scholar
Chun, D. M. (2013). Contributions of tracking user behavior to SLA research. CALICO Journal, 30, 256262.Google Scholar
Cope, B., & Kalantzis, M. (2016). Big data comes to school: Implications for learning, assessment, and research. AERA Open, 2(2), 2332858416641907.Google Scholar
Crossley, S. A., Greenfield, J., & McNamara, D. S. (2008). Assessing text readability using cognitively based indices. TESOL Quarterly, 42(3), 475493.Google Scholar
Crossley, S. A., & McNamara, D. S. (2009). Computational assessment of lexical differences in L1 and L2 writing. Journal of Second Language Writing, 18(2), 119135.Google Scholar
Crossley, S. A., Kyle, K., & McNamara, D. S. (2016). The development and use of cohesive devices in L2 writing and their relations to judgments of essay quality. Journal of Second Language Writing, 32, 116.Google Scholar
Crossley, S. A., & McNamara, D. S. (2014). Does writing development equal writing quality? A computational investigation of syntactic complexity in L2 learners. Journal of Second Language Writing, 26, 6679. https://doi.org/10.1016/j.jslw.2014.09.006Google Scholar
Csizér, K., & Dörnyei, Z. (2005). Language learners’ motivational profiles and their motivated learning behavior. Language Learning, 55(4), 613659. https://doi.org/10.1111/j.0023-8333.2005.00319.xGoogle Scholar
de Nooy, W., Mrvar, A., & Batagelj, V. (2005). Exploratory social network analysis with Pajek. New York, NY: Cambridge University Press.Google Scholar
Dörnyei, Z. (2007). Research methods in applied linguistics: Quantitative, qualitative, and mixed methodologies. Oxford: Oxford University Press.Google Scholar
Dörnyei, Z. (2009). Individual differences: Interplay of learner characteristics and learning environment. Language Learning, 59, 230248.Google Scholar
Fayyad, U., Piatetsky-Shapiro, G., & Smyth, P. (1996). From data mining to knowledge discovery in databases. AI magazine, 17(3), 37.Google Scholar
Feldman, R., & Sanger, J. (2006). Information extraction. The text mining handbook: Advanced approaches in analyzing unstructured data (pp. 94130). Cambridge, UK: Cambridge University Press.Google Scholar
Fraley, C., Raftery, A. E., Scrucca, L., Murphy, T. B., & Fop, M. (2017). mclust: Gaussian mixture modelling for model-based clustering, classification, and density estimation (R package version 5.3). https://CRAN.R-project.org/package=mclustGoogle Scholar
Gabbriellini, S. (2014). The evolution of online forums as communication networks: An agent-based model. Revue française de sociologie, 55(4), 805826.Google Scholar
Garrett, N. (1991). Technology in the service of language learning: Trends and issues. Modern Language Journal, 75(1), 74101.Google Scholar
Godwin-Jones, R. (2017). Scaling up and zooming in: Big data and personalization in language learning. Language Learning & Technology, 21(1), 415.Google Scholar
Grisham, D. L., & Wolsey, T. D. (2006). Recentering the middle school classroom as a vibrant learning community: Students, literacy, and technology intersect. Journal of Adolescent & Adult Literacy, 49, 648660. https://doi.org/10.1598/JAAL.49.8.2Google Scholar
James, G., Witten, D., Hastie, T., & Tibshirani, R. (2013). An introduction to statistical learning. New York, NY: Springer.Google Scholar
Johnson, L., Smith, R., Willis, H., Levine, A., & Haywood, K. (2011). The 2011 horizon report. Austin, TX: The New Media Consortium. Retrieved July 9, 2013.Google Scholar
Kern, R. G. (1995). Restructuring classroom interaction with networked computers: Effects on quantity and characteristics of language production. The Modern Language Journal, 79(4), 457476. https://doi.org/10.1111/j.1540-4781.1995.tb05445.xGoogle Scholar
Kern, R., & Warschauer, M. (2000). Theory and practice of network-based language teaching. In Warschauer, M. & Kern, R. (Eds.), Network-based language teaching: Concepts and practice (pp. 119). New York, NY: Cambridge University Press.Google Scholar
Kojic-Sabo, I., & Lightbown, P. M. (1999). Students’ approaches to vocabulary learning and their relationship to success. The Modern Language Journal, 83(2), 176192. https://doi.org/10.1111/0026-7902.00014Google Scholar
Lankshear, C., & Knobel, M. (2007). Sampling “the new” in new literacies. A New Literacies Sampler, 29, 124.Google Scholar
Lee, H., Warschauer, M., & Lee, J. H. (2017). The effects of concordance-based electronic glosses on L2 vocabulary learning. Language Learning & Technology, 21(2), 3251. https://doi.org/10125/44610Google Scholar
Lee, H., Warschauer, M., & Lee, J. H. (2018). The effects of corpus use on second language vocabulary learning: A multilevel meta-analysis. Applied Linguistics. https://doi.org/10.1093/applin/amy012Google Scholar
Lee, H., Warschauer, M., & Lee, J. H. (2019). Advancing CALL research via data mining techniques: Unearthing hidden groups of learners in a corpus-based L2 vocabulary learning experiment. ReCALL, 31(2), 135149.Google Scholar
Leijten, M., & Van Waes, L. (2013). Keystroke logging in writing research: Using Inputlog to analyze and visualize writing processes. Written Communication, 30(3), 358392.Google Scholar
Lindgren, E., & Sullivan, K. P. (2002). The LS graph: A methodology for visualizing writing revision. Language Learning, 52(3), 565595.Google Scholar
Link, S., & Li, Z. (2015). Understanding online interaction through learning analytics: Defining a theory-based research agenda. In Dixon, E. & Thomas, M. (Eds.), Researching language learner interactions online: From social media to MOOCs (pp. 369385). San Marcos, TX: Texas State University.Google Scholar
Liu, S., & Kunnan, A. J. (2016). Investigating the application of automated writing evaluation to Chinese undergraduate English majors: A case study of WriteToLearn. Calico Journal, 33(1).Google Scholar
McNamara, D. S., Ozuru, Y., Graesser, A. C., & Louwerse, M. (2006). Validating Coh-Metrix. In Sun, R. & Miyake, N. (Eds.), Proceedings of the 28th annual conference of the cognitive science society (pp. 573578). Mahwah, NJ: Erlbaum.Google Scholar
Means, B., Bakia, M., & Murphy, R. (2014). Learning online: What research tells us about whether, when and how. New York, NY: Routledge.Google Scholar
Meilă, M., & Heckerman, D. (2001). An experimental comparison of model-based clustering methods. Machine Learning, 42(1/2), 929. https://doi.org/10.1023/A:1007648401407Google Scholar
Miller, K. S., Lindgren, E., & Sullivan, K. P. (2008). The psycholinguistic dimension in second language writing: Opportunities for research and pedagogy using computer keystroke logging. TESOL Quarterly, 42(3), 433454.Google Scholar
Papi, M., & Teimouri, Y. (2014). Language learner motivational types: A cluster analysis study. Language Learning, 64(3), 493525. https://doi.org/10.1111/lang.12065Google Scholar
Peck, S. C., Vida, M., & Eccles, J. S. (2008). Adolescent pathways to adulthood drinking: Sport activity involvement is not necessarily risky or protective. Addiction, 103(S1), 6983. https://doi.org/10.1111/j.1360-0443.2008.02177.xGoogle Scholar
Saddler, B., & Graham, S. (2005). The effects of peer-assisted sentence-combining instruction on the writing performance of more and less skilled young writers. Journal of Educational Psychology, 97(1), 43.Google Scholar
Satar, H. M., & Akcan, S. (2018). Pre-service EFL teachers’ online participation, interaction, and social presence. Language Learning & Technology, 22(1), 157183.Google Scholar
Scrucca, L., Fop, M., Murphy, T. B., & Raftery, A. E. (2016) mclust 5: Clustering, classification and density estimation using Gaussian finite mixture models. The R Journal, 8(1), 289317. https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5096736Google Scholar
Shea, P., Hayes, S., Smith, S. U., Vickers, J., Bidjerano, T., Gozza-Cohen, M., … Tseng, C.-H. (2013). Online learner self-regulation: Learning presence viewed through quantitative content- and social network analysis. The International Review of Research in Open and Distributed Learning, 14(3), 427461. https://doi.org/10.19173/irrodl.v14i3.1466Google Scholar
Skehan, P. (1991). Individual differences in second language learning. Studies in Second Language Acquisition, 13(2), 275298. https://doi.org/10.1017/S0272263100009979Google Scholar
Smith, B. (2017). Technology-enhanced SLA research. In C. Chapelle & S. Sauro (Eds.), The handbook of technology in second language teaching and learning (pp. 444–458). Oxford: Wiley-Blackwell.Google Scholar
Srivastava, A. N., & Sahami, M. (2009). Text mining: Classification, clustering, and applications. Boca Raton, FL: Chapman and Hall/CRC.Google Scholar
Sun, Y.-C., & Chang, Y. (2012). Blogging to learn: Becoming EFL academic writers through collaborative dialogues. Language Learning & Technology, 16(1), 4361.Google Scholar
Sun, Z., Lin, C.-H., Wu, M., Zhou, J., & Luo, L. (2018). A tale of two communication tools: Discussion-forum and mobile instant-messaging apps in collaborative learning. British Journal of Educational Technology, 49(2), 248261. https://doi.org/10.1111/bjet.12571Google Scholar
The Douglas Fir Group. (2016). A transdisciplinary framework for SLA in a multilingual world. Modern Language Journal, 100 (Supplement 2016), 1947. https://doi.org/10.1111/modl.12301Google Scholar
The New London Group. (1996). A pedagogy of multiliteracies: Designing social futures. Harvard Educational Review, 66(1), 6093.Google Scholar
Thorne, S. L., & Smith, B. (2011). Second language development theories and technology-mediated language learning. CALICO Journal, 28(2), 268277. https://doi.org/10.11139/cj.28.2.268-277Google Scholar
Wang, D., Olson, J. S., Zhang, J., Nguyen, T., & Olson, G. M. (2015). DocuViz: Visualizing collaborative writing. In Proceedings of the 33rd Annual ACM conference on human factors in computing systems (pp. 18651874). New York, NY: Association for Computing Machinery.Google Scholar
Warschauer, M. (1997). Computer-mediated collaborative learning: Theory and practice. Modern Language Journal, 81(4), 470481.Google Scholar
Witten, I. H., Frank, E., Hall, M. A., & Pal, C. J. (2016). Data mining: Practical machine learning tools and techniques. Cambridge, UK: Morgan Kaufmann.Google Scholar
Wood, D., Bruner, J. S., & Ross, G. (1976). The role of tutoring in problem solving. Journal of Child Psychology and Psychiatry, 17(2), 89100.Google Scholar
Xie, K., Di Tosto, G., Lu, L., & Cho, Y. S. (2018). Detecting leadership in peer-moderated online collaborative learning through text mining and social network analysis. The Internet and Higher Education, 38, 917. https://doi.org/10.1016/j.iheduc.2018.04.002Google Scholar
Yarrow, F., & Topping, K. J. (2001). Collaborative writing: The effects of metacognitive prompting and structured peer interaction. British Journal of Educational Psychology, 71(2), 261282.Google Scholar
Yim, S., & Warschauer, M. (2017). Web-based collaborative writing in L2 contexts: Methodological insights from text mining. Language Learning & Technology, 21(1), 146165.Google Scholar
Yim, S., & Warschauer, M. (2019a). Investigation of factors shaping synchronous collaborative writing: Text mining approach. Manuscript submitted for publication.Google Scholar
Yim, S., & Warschauer, M. (2019b). Student initiating feedback: Potential of social media. In Hyland, K. & Shaw, P. (Eds.), Feedback in Second Language Writing (pp. 285303). Cambridge, UK: Cambridge University Press.Google Scholar
Yim, S., Wang, D., Olson, J., Vu, V., & Warschauer, M. (2017). Synchronous collaborative writing in the classroom: Undergraduates’ collaboration practices and their impact on writing style, quality, and quantity. In Proceedings of the 2017 ACM Conference on Computer Supported Cooperative Work and Social Computing (pp. 468479). New York, NY: Association for Computing Machinery.Google Scholar
Zheng, B., & Warschauer, M. (2015). Participation, interaction, and academic achievement in an online discussion environment. Computers & Education, 84, 7889. https://doi.org/10.1016/j.compedu.2015.01.008Google Scholar
Zheng, B., Lawrence, J. F., Warschauer, M., & Lin, C.-H. (2015). Middle school students’ writing and feedback in a cloud-based classroom environment. Technology, Knowledge and Learning, 20, 201229.Google Scholar
Zhu, E. (2006). Interaction and cognitive engagement: An analysis of four asynchronous online discussions. Instructional Science, 34(6), 451480. https://doi.org/10.1007/s11251-006-0004-0Google Scholar