Self-assessment in second language learning

Yuko Goto Butler

doi:10.1017/S0261444822000489

Self-assessment in second language learning

Published online by Cambridge University Press: 09 January 2023

Yuko Goto Butler

Show author details

Yuko Goto Butler*: Affiliation:
University of Pennsylvania, Philadelphia, USA
*: Email: ybutler@gse.upenn.edu

Article contents

Extract
Introduction
Footnotes
References

Rights & Permissions

Extract

Self-assessment (SA), as an activity for reflecting on one's own performance and abilities (Black & Wiliam, 1998), has been a topic of interest to educators over the years. Among second language (L2) educators, SA began growing in popularity in the 1970s and 1980s, when L2 educators’ focus shifted from analyzing linguistic systems to examining how learners learn a language. Many can-do statements and SA descriptors have been developed for L2 language learning, including SA grids aligned with the Common European Framework of Reference (CEFR, Council of Europe, 2022) and can-do statements prepared by the American Council on the Teaching of Foreign Language (ACTFL) in collaboration with the National Council of State Supervisors for Languages (NCSSFL) (ACTFL, n.d.). Textbooks and other L2 learning materials, including online apps, often contain SA items. SA can be used in conjunction with other assessments, such as traditional objective assessments, peer assessments, and portfolios. Teachers are often encouraged to incorporate SA into their curricula as part of the promotion of constructivist approaches to education, which have been particularly popular since the late 1980s (e.g., Nunan, 1988; Tarone & Yule, 1989); SA resonates well with modern learning theories such as learner-centered education, self-regulated learning, and autonomous learning (Butler, in press).

Type: Research Timeline
Information: Language Teaching , Volume 57 , Issue 1 , January 2024 , pp. 42 - 56

DOI: https://doi.org/10.1017/S0261444822000489 [Opens in a new window]
Creative Commons: This is an Open Access article, distributed under the terms of the Creative Commons Attribution licence (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted re-use, distribution and reproduction, provided the original article is properly cited.
Copyright: Copyright © The Author(s), 2023. Published by Cambridge University Press

Introduction

Self-assessment (SA), as an activity for reflecting on one's own performance and abilities (Black & Wiliam, Reference Black and Wiliam1998), has been a topic of interest to educators over the years. Among second language (L2) educators, SA began growing in popularity in the 1970s and 1980s, when L2 educators’ focus shifted from analyzing linguistic systems to examining how learners learn a language. Many can-do statements and SA descriptors have been developed for L2 language learning, including SA grids aligned with the Common European Framework of Reference (CEFR, Council of Europe, 2022) and can-do statements prepared by the American Council on the Teaching of Foreign Language (ACTFL) in collaboration with the National Council of State Supervisors for Languages (NCSSFL) (ACTFL, n.d.). Textbooks and other L2 learning materials, including online apps, often contain SA items. SA can be used in conjunction with other assessments, such as traditional objective assessments, peer assessments, and portfolios. Teachers are often encouraged to incorporate SA into their curricula as part of the promotion of constructivist approaches to education, which have been particularly popular since the late 1980s (e.g., Nunan, Reference Nunan1988; Tarone & Yule, Reference Tarone and Yule1989); SA resonates well with modern learning theories such as learner-centered education, self-regulated learning, and autonomous learning (Butler, in press).

Despite widespread promotion of SA through policy and curricular initiatives, the actual implementation of SA in language classrooms varies considerably, and SA is not often used as effectively as expected in practice (Bullock, Reference Bullock2011; Nikolov & Timpe-Laughlin, Reference Nikolov and Timpe-Laughlin2020). Reluctance to use SA in classrooms may be owing, in part, to users’ perception of SA; for example, teachers may be skeptical about the accuracy of their students’ SA, and students may not see SA as helpful for their learning (e.g., Mäkipää, 2021Footnote *).

Mixed views of SA among L2 educators and students may partially stem from the fact that SA entails multiple functions and purposes. Broadly speaking, varied definitions of SA reflect two major functions. One common focus is on the measurement functions of SA, namely, assessment of learning. As exemplified in Bailey's (Reference Bailey1998) definition of SA, “procedures by which learners themselves evaluate their language skills and knowledge” (p. 227), some researchers emphasize its measurement functions. The other major focus is on the aspects of SA that support learning, or assessment for learning. An example of the latter can be seen in Andrade and Valtcheva's (Reference Andrade and Valtcheva2009) definition: “a process of formative assessment during which students reflect on the quality of their work, judge the degree to which it reflects explicitly stated goals or criteria, and revise accordingly” (p. 13). SA can be used for both summative purposes (i.e., attributing values or scores to one's learning outcome, primarily for grading) or formative purposes (i.e., monitoring, or self-reflecting on, the ongoing process of learning), and the aforementioned definitions of assessment of learning and assessment for learning roughly correspond to the summative and formative purposes of its use, respectively.

The purpose of this timeline is to review major research on SA in L2 learning (including foreign language [FL] learning) conducted in the last 30 years and to illustrate how the field came to better understand the use of SA both as a measurement tool and a learning/teaching tool for L2 learning. Because of space constraints, the timeline is limited to select studies that were published in English in major academic journals and book chapters; I selected major studies that were highly cited and/or provided important new insights that influenced successive research. While many studies also examined in-service or pre-service teachers’ SA of their teaching performance or language proficiencies as well as their attitudes towards SA, these studies are excluded in the timeline below. The selected studies are categorized according to the following themes:

A. Assessment of learning orientation
- A1. Theoretical frameworks
- A2. Learners’ perception
- A3. Reliability and validity
- A4. Variables influencing students’ SA
- A5. SA development and implementation
- A6. Meta-analyses, qualitative reviews, etc.
B. Assessment for learning orientation
- B1. Theoretical frameworks
- B2. Effectiveness of SA on learning and self-regulation
- B3. Innovative use of SA (e.g., SA as a social activity, via technology)
- B4. Meta-analyses, qualitative reviews, etc.
C. Targeted age groups
- C1. Young learners (up to primary school)
- C2. Secondary school students
- C3. Adults
- C4. General or unspecified

Reflecting the strong psychometric tradition of language assessment since the introduction of modern assessment theories in language education in the 1960s (e.g., Carroll, Reference Carroll and Davies1968; Lado, Reference Lado1961), research on SA in L2 learning has largely examined the efficacy of SA from a measurement point of view until relatively recently (Category A in the timeline). More specifically, researchers were interested in examining the reliability and validity of SA. A few studies examined the reliability and validity of can-do statements or descriptors, including ACTFL can-do statements (Brown et al., 2014*; Ma & Winke, 2019*; Malabonga et al., 2005*; Summers et al., 2019*; Tigchelaar, 2019*; Tigchelaar et al., 2017*); CEFR descriptors (Little, 2005*); and the Diagnostic Language Assessment System (DIALANG), which was developed based on the CEFR (Brantmeier et al., 2012*; Luoma & Tarnanen, 2003*; Ünaldi, 2016*; see https://dialangweb.lancaster.ac.uk for the items in DIALANG). While studies with measurement-oriented approaches are still very popular, in the last decade or so a growing number of studies have examined SA as a learning/teaching tool (Category B in the timeline). In these studies, researchers were interested in understanding how best to implement SA to maximize its effect on students’ L2 learning.

As a measurement tool, SA generally has moderate correlations with external assessments, according to meta-analyses (Li & Zhang, 2021*; Ross, 1998*). Thus, depending on the purpose of its use and the importance placed on it (i.e., whether it is a high-stakes context), SA can replace or complement other external assessments (e.g., teachers’ assessments and objective language measures) (Malabonga et al., 2005*), although it may not be as reliable and valid as peer assessment (PA) (Matsuno, 2009*; Patri, 2002*). SA can also be used as a reasonably reliable measure of one's learning progress over time (Brown et al., 2014*).

It is important to note, however, that there are substantial variabilities in the accuracy of learners’ SA across studies. Three types of variables can influence the accuracy of SA by L2-learning students: (a) variables related to item construction and administration, (b) learner-related variables, and (c) external or environmental variables. The variables related to item construction and administration include item wording (Ross, 1998*; Tigchelaar et al., 2017*) and response formats (e.g., can-do or dichotomous formats vs. Likert-scale formats) (Butler, 2018a*). Compared with general and holistic descriptions, specific descriptors, particularly descriptors consistent with learners’ experiences, tend to increase accuracy (Butler, 2018b*; Butler & Lee, 2006*; Edele et al., 2015*; Ross, 1998*; Suzuki, 2015*). Other factors that matter include the point of reference that learners relied on when self-evaluating (Butler, 2018a*, 2018b*; Moritz, 1996*; Swain & Hart, 1993*) and the tasks or skill domains being assessed (Bachman & Palmer, 1989*; Brantmeier et al., 2012*; Ross, 1998*). Influential learner-related variables include learners’ L2 proficiency (AlFallay, 2004*; Brantmeier et al., 2012*; Dolosic et al., 2016*; Ma & Winke, 2019*; Matsuno, 2009*; Ross, 1998*, Ünaldi, 2016*), age (Butler, 2018a*, 2018b*; Butler & Lee, 2006*), attitudes and personality factors such as self-esteem (AlFallay, 2004*), and learning experience (Suzuki, 2015*). Finally, external or environmental factors—including cultural environments (Blanche & Merino, 1989*; Matsuno, 2009*) and heritage or nonheritage learning contexts (Ashton, 2014*)—seem to play significant roles as well. Researchers have documented response biases associated with various learner characteristics. For example, lower proficiency students or students with less experience with language learning tend to overestimate their abilities—a phenomenon often referred to in psychology as the Dunning-Kruger effect (Dunning et al., Reference Dunning, Johnson, Ehrlinger and Kruger2003) (Heilenman, 1990*; Lappin-Fortin & Rye, 2014*; Suzuki, 2015*; Trofimovich et al., 2016*; Ünaldi, 2016*). Similarly, younger children tend to overestimate their abilities (Butler & Lee, 2006*). Learners tend to be more strict when evaluating their own performance compared with assessing their peers’ performance (Matsuno, 2009*; Tigchelaar, 2016*). Finally, in certain cultures, people might be expected to be humble when self-assessing their abilities and performance (Edele et al., 2015*; Matsuno, 2009*).

As the following timeline illustrates, over time researchers have shown increasing interest in understanding SA's role not only as a measurement tool but also as a learning and teaching tool; namely, how it affects students’ L2 learning, self-regulation, and self-efficacy. Self-regulation refers to one's ability to control one's cognition, affect, and behaviors to achieve a goal, and self-efficacy means one's confidence in ability to perform relevant actions to accomplish a goal. SA is thought to promote learners’ self-regulation because it can help them set goals and criteria, monitor their performance, reflect on their performance, and internalize the whole learning experience. SA can improve learners’ self-efficacy by helping them understand the requirements of targeted tasks, which can in turn improve the likelihood that they will successfully complete the task (Butler, in press).

Qualitative or mixed methods have been employed to uncover the process of learning and/or learners’ and teachers’ perceptions and experiences of using SA as a learning/instructional tool. As predicted, studies have found that SA improves learners’ self-reflection on their abilities and performance and leads to greater self-efficacy (Blanche & Merino, 1989*; Brantmeier et al., 2012*; Butler & Lee, 2010*; Glover, 2011*; Jang et al., 2015*; Kissling & O'Donnell, 2015*). As with the accuracy of SA in measurement-oriented studies, in learning-oriented studies, the effects of SA on learning were also influenced by several variables, including the duration of the SA intervention and the wording and structures of the rubrics used (Wang, 2017*). Students’ perception of SA as a L2 learning/instructional tool is generally positive if the criterion is clearly provided and/or some form of training (including repeated use of SA) is offered (Babaii et al., 2016*; De Saint Léger, 2009*; Glover, 2011*; Hung, 2019*; Sullivan & Lindgren, 2002*). With guidance, repeated use of SA can not only improve students’ perceptions of their L2 learning but also lead to actual learning gains, as measured by external assessments (Butler & Lee, 2010*). While the importance of feedback—including self-feedback—through SA on one's learning is acknowledged, its effect is a complicated combination of factors that includes one's previous experiences and future goal setting, aspirations, and self-confidence (Butler, 2018a*, 2019b*; Huang, 2016*; Tigchelaar, 2016*).

Although college students in classroom settings have historically been the primary target of studies on SA, recent studies have considered more diverse populations such as young learners (Ashton, 2014*; Butler, 2018a*, 2018b*; Butler & Lee, 2006*, 2010*; Dolosic et al., 2016*; Jang et al., 2015*; Liu & Brantmeier, 2019*) and immigrants (Edele et al., 2015*). SA is increasingly administered through computers, and computer-administrated SA appears to increase accuracy (Li & Zhang, 2021). Moreover, while the general conceptualization of SA as an “internal or self-directed” cognitive activity (Oscarson, 1989*, p. 1) has been dominant, researchers have started paying greater attention to the social and emotional aspects of SA rather than viewing it as a purely individual cognitive activity (Andrade & Brown, Reference Andrade, Brown, Brown and Harris2016; Butler, Reference Butler, Valente and Xerriin-press). Most recently, SA is used to evaluate L2 learners’ intercultural communication proficiency as part of communicative competence (e.g., Lenkaitis, 2021*).

In sum, SA has gained the attention of L2 researchers and educators in the last couple of decades, both as a potential measurement and learning/teaching tool. Several variables that are influential for accuracy (as a measurement tool) and learning (as a learning/teaching tool) have been identified. Most recently, research on SA is more diversified in terms of its target population and means of administration (e.g., computer-administered SA), and it takes a more ecological perspective, viewing SA as a social activity as well as an individual cognitive activity.

Yuko Goto Butler is Professor of Educational Linguistics at the Graduate School of Education at the University of Pennsylvania. She is also the director of Teaching English to Speakers of Other Languages (TESOL) program at Penn. Her research interests are primarily focused on the improvement of second/foreign language education among young learners in the U.S. and Asia in response to the diverse needs of an increasingly globalizing world. Her work has also focused on identifying effective English-as-a-second language/English-as-a-foreign-language (ESL/EFL) teaching and learning strategies and assessment methods that take into account the relevant linguistic and cultural contexts in which instruction takes place.

Footnotes

* Indicates full reference is described in the subsequent timeline.

¹ Note. Authors’ names are shown in small capitals when the study referred to appears elsewhere in this timeline.

² Higgins, E. T., Strauman, T., & Klein, R. (1986). Standards and the processes of self-evaluation: Multiple effects from multiple stages. In R. Sorrentino & E. T. Higgins (Eds.), Handbook of motivation and cognition: Foundations of social behavior (pp. 23–59). The Guilford Press.

³ Andrade, H. L. (2019). A critical review of research on student self-assessment. Frontiers in Education, 4, article 87. https://doi.org/10.3389/feduc.2019.00087

References

American Council on the Teaching of Foreign Language. (n.d.). NCSSFL-ACTFL can-do statements. https://www.actfl.org/resources/ncssfl-actfl-can-do-statements Google Scholar

Andrade, H. L., & Brown, G. T. L. (2016). Student self-assessment in the classroom. In Brown, G. T. L., & Harris, L. R. (Eds.), Handbook of human and social conditions in assessment (pp. 319–334). Routledge.Google Scholar

Andrade, H., & Valtcheva, A. (2009). Promoting learning and achievement through self-assessment. Theory Into Practice, 48(1), 12–19. doi:10.1080/00405840802577544CrossRef Google Scholar

Bailey, K. M. (1998). Learning about language assessment: Dilemmas, decisions, and directions. Heinle & Heinle.Google Scholar

Black, P., & Wiliam, D. (1998). Assessment and classroom learning. Assessment in Education: Principles, Policy and Practice, 5(1), 7–74. doi:10.1080/0969595980050102Google Scholar

Bullock, D. (2011). Learner self-assessment: An investigation into teachers’ beliefs. ELT Journal, 65(2), 114–127. doi:10.1093/elt/ccq041CrossRef Google Scholar

Butler, Y. G. (in-press). Expanding the role of self-assessment: From assessing to learning English. In Valente, D., & Xerri, D. (Eds.), Innovative practices in early English language education. Palgrave Macmillan.Google Scholar

Carroll, J. B. (1968). The psychology of language testing. In Davies, A. (Ed.), Language testing symposium: A psycholinguistic approach (pp. 46–69). Oxford University Press.Google Scholar

Council of Europe. (2022). Self-assessment grid-Table 2 (CEFR 3.3): Common reference levels. https://www.coe.int/en/web/common-european-framework-reference-languages/table-2-cefr-3.3-common-reference-levels-self-assessment-grid Google Scholar

Dunning, D., Johnson, K., Ehrlinger, J., & Kruger, J. (2003). Why people fail to recognize their own incompetence. Current Directions in Psychological Science, 12(3), 83–87. doi:10.1111/1467-8721.01235CrossRef Google Scholar

Lado, R. (1961). Language testing: The construction and use of foreign language test. A teachers’ book. McGraw-Hill Book Company.Google Scholar

Nikolov, M., & Timpe-Laughlin, V. (2020). Assessing young learners’ foreign language abilities. Language Teaching, 54(1), 1–37. doi:10.1017/S0261444820000294CrossRef Google Scholar

Nunan, D. (1988). The learner-centered curriculum: A study in second language teaching. Cambridge University Press.CrossRef Google Scholar

Tarone, E., & Yule, G. (1989). Focus on the language learner. Oxford University Press.Google Scholar

Article contents

Self-assessment in second language learning

Extract

Introduction

Footnotes

References

Save article to Kindle

Save article to Dropbox

Save article to Google Drive

Reply to: Submit a response

Your details

You have entered the maximum number of contributors

Conflicting interests