Predicting naming scores from language history: A little immersion goes a long way, and self-rated proficiency matters more than percent use

Anne Neveu; Tamar H. Gollan

doi:10.1017/S1366728924000038

Predicting naming scores from language history: A little immersion goes a long way, and self-rated proficiency matters more than percent use

Published online by Cambridge University Press: 14 February 2024

Anne Neveu

and

Tamar H. Gollan

Show author details

Anne Neveu*: Affiliation:
Department of Psychiatry, University of California, San Diego, La Jolla, CA 92093, USA
Tamar H. Gollan: Affiliation:
Department of Psychiatry, University of California, San Diego, La Jolla, CA 92093, USA
*: Corresponding author: Anne Neveu; Email: aneveu@health.ucsd.edu

Article contents

Abstract
Introduction
Experiment 1 – Language Dominance Effects in Older Bilinguals
Experiment 2 – Aging Effects
General Discussion
Conclusions
Data availability
Competing interests
Ethics statement
Footnotes
References

Rights & Permissions

Abstract

Language proficiency is a critically important factor in research on bilingualism, but researchers disagree on its measurement. Validated objective measures exist, but investigators often rely exclusively on subjective measures. We investigated if combining multiple self-report measures improves prediction of objective naming test scores in 36 English-dominant versus 32 Spanish-dominant older bilinguals (Experiment 1), and in 41 older Spanish–English bilinguals versus 41 proficiency-matched young bilinguals (Experiment 2). Self-rated proficiency was a powerful but sometimes inaccurate predictor and better predicted naming accuracy when combined with years of immersion, while percent use explained little or no unique variance. Spanish-dominant bilinguals rated themselves more strictly than English-dominant bilinguals at the same objectively measured proficiency level. Immersion affected young more than older bilinguals, and non-immersed (English-dominant) more than immersed (Spanish-dominant) bilinguals. Self-reported proficiency ratings can produce spurious results, but predictive power improves when combined with self-report questions that might be less affected by subjective judgements.

Keywords

bilingualism self-report language dominance aging naming

Type: Research Article
Information: Bilingualism: Language and Cognition , First View , pp. 1 - 15

DOI: https://doi.org/10.1017/S1366728924000038 [Opens in a new window]
Open Practices: Open data Open materials
Copyright: Copyright © The Author(s), 2024. Published by Cambridge University Press

Introduction

Measuring language proficiency, language dominance, and degree of bilingualism is central to psycholinguistic research on bilingualism and in clinical evaluations of bilingual patients (Gasquoine & Gonzalez, Reference Gasquoine and Gonzalez2012; Lorenzen & Murray, Reference Lorenzen and Murray2008; Olson, Reference Olson2023; Paplikar et al., Reference Paplikar, Alladi, Varghese, Mekala, Arshad, Sharma, Saroja, Divyaraj, Dutt, Ellajosyula, Ghosh, Iyer, Sunitha, Kandukuri, Kaul, Khan, Mathew, Menon, Nandi, Narayanan, Nehra, Padma, Pauranik, Ramakrishnan, Sarath, Shah, Tripathi, Sylaja, Varma, Verma and Vishwanath2021; Rivera Mindt et al., Reference Rivera Mindt, Arentoft, Kubo Germano, D'Aquila, Scheiner, Pizzirusso, Sandoval and Gollan2008). Language proficiency typically refers to how quickly, accurately, and easily a person can retrieve words and other linguistic structures, and facility of language use across various communicative contexts (Hulstijn, Reference Hulstijn2011). Language dominance refers to which language is more proficient, which can change for bilinguals over different points in their lifetime (Birdsong, Reference Birdsong2014; Treffers-Daller & Silva-Corvalán, Reference Treffers-Daller and Silva-Corvalán2016). Numerous different language experiences are thought to influence language proficiency and dominance, including age of acquisition, frequency of use, contexts of use, formal education, time immersed in the language, and many others (Hulstijn, Reference Hulstijn2011; Schmid & Yılmaz, Reference Schmid and Yılmaz2018). Both the variety of factors that can influence proficiency and the unique character of bilinguals’ individual experiences make the accurate evaluation of proficiency, whether objective, subjective, or both, a complex endeavor (for a review, see Olson, Reference Olson2023).

The use of self-report questionnaires to determine proficiency level in each of the bilinguals’ languages is ubiquitous in research and clinical settings. Various language history questionnaires have been designed in attempts to gain a more comprehensive picture of a bilingual's language skills and try to uniformize data collection across different labs and clinics (e.g., the Language Experience and Proficiency Questionnaire – LEAP-Q –, Marian et al., Reference Marian, Blumenfeld and Kaushanskaya2007; the Language History Questionnaire – LHQ –, Li et al., Reference Li, Sepanski and Zhao2006; the Language Use Questionnaire – LUQ -, Kastenbaum et al., Reference Kastenbaum, Bedore, Pena, Sheng, Mavis, Sebastian, Vallila-Rohter and Kiran2018; and the Language and Social Background Questionnaire – LSBQ, Anderson et al., Reference Anderson, Mak, Keyvani Chahi and Bialystok2018; Luk & Bialystok, Reference Luk and Bialystok2013 – for a detailed review, see Rothman et al., Reference Rothman, Bayram, DeLuca, Alonso, Kubota, Puig-Mayenco, Luk, Anderson and Grundy2023). However, for a young participant, it takes at least 10 minutes to complete these questionnaires (with the exception of the LUQ which is longer). Moreover, the LEAP-Q has been widely adopted and has contributed to creating more consistency in measuring language history in bilingualism research, but there remains debate as to which aspects of bilingual experience should be used when trying to categorize bilinguals into groups (Kaushanskaya et al., Reference Kaushanskaya, Blumenfeld and Marian2020), and development of more time-efficient measures is critical in clinical settings and for encouraging wide use of uniform approaches to measurement.

Self-ratings of proficiency

Perhaps the most widely used item in language history questionnaires is one that asks bilinguals to rate their proficiency level in each language in different modalities (reading, writing, speaking, understanding). However, self-ratings of proficiency often correspond to the participant's own perception of their skill rather than to a reflection of their true performance. In clinical settings it is more common to ask bilinguals to indicate which language is dominant or which language they prefer for testing and evaluation. However, the correlations between self-ratings and objective measures of proficiency have varied from small to moderate (e.g., Marian et al., Reference Marian, Blumenfeld and Kaushanskaya2007; Schrauf, Reference Schrauf2009), to strong in size (e.g., Ross, Reference Ross1998), which raises questions about their accuracy and predictive power. In addition, while self-ratings of language dominance are usually accurate, exceptions do occur (in which bilinguals perform better on objective tests in the language they said was less proficient (see Gollan et al., Reference Gollan, Weissberger, Runnqvist, Montoya and Cera2012; Tomoschuk et al., Reference Tomoschuk, Ferreira and Gollan2019), which can have serious unfortunate consequences. Notably, Tomoschuk et al. (Reference Tomoschuk, Ferreira and Gollan2019) reported major discrepancies in how bilinguals of different language combinations (Chinese–English and Spanish–English) and even within the same language combinations bilinguals with a different dominant language (English-dominant vs. Spanish-dominant Spanish–English bilinguals) interpret self-rating scales. This suggests self-ratings should not be used when comparing or collapsing bilinguals of different language combinations, language dominance, and cultural backgrounds since they may not share the same points of reference, which compromises the extent to which self-ratings accurately reflect objective proficiency level.

Among objective measures of proficiency are picture naming (e.g., the Multilingual Naming Test – MINT) and oral proficiency interviews (OPIs; Gollan et al., Reference Gollan, Weissberger, Runnqvist, Montoya and Cera2012), and verbal fluency scores (e.g., Ardila et al., Reference Ardila, Roselli and Puente1994; Artiola i Fortuny et al., Reference Artiola i Fortuny, Heaton and Hermosillo1998). One study by Gollan et al. (Reference Gollan, Weissberger, Runnqvist, Montoya and Cera2012) examined to what extent self-report measures predicted objective measures of proficiency in younger and older Spanish–English bilinguals. Self-ratings of language dominance strongly correlated with measures of proficiency in each language. However, bilinguals who said they were Spanish-dominant tended to be more balanced in proficiency in the two languages, bilinguals who reported being balanced tended to be English-dominant, and bilinguals who rated themselves as English-dominant were the most accurate, although they may have overestimated their English proficiency level (i.e., their OPI and MINT scores were lower than expected given their self-rating). In a more recent study also of Spanish–English bilinguals, self-ratings exhibited low or moderate correlations with objective proficiency measures and were again better at predicting language dominance than absolute proficiency level in each language (Garcia & Gollan, Reference Garcia and Gollan2022). Similarly, in young adult Mandarin–English bilinguals, objective measures of proficiency revealed that bilinguals who self-reported that they were balanced bilinguals performed better in English than in Mandarin on objective measures (Sheng et al., Reference Sheng, Lu and Gollan2014). Thus, in all three studies, bilinguals were more accurate in determining their dominance than their proficiency level, and though self-rated proficiency was significantly correlated with objectively measured proficiency in both languages, many problematic discrepancies between self-ratings and objective measures were apparent.

Besides differences in the consistency of self-ratings across language combinations and language dominance, differences have also been found across different age groups. In Gollan et al. (Reference Gollan, Weissberger, Runnqvist, Montoya and Cera2012), younger and older bilinguals were not compared directly (they were tested in separate experiments). Spanish-dominant older bilinguals, based on their self-ratings, were 36%Footnote ¹ more proficient in Spanish than in English. Similarly, based on the OPI, they were 32% more proficient in Spanish than in English, but based on the MINT, this value was only 7% (see Gollan et al., Reference Gollan, Weissberger, Runnqvist, Montoya and Cera2012, Table 3). By contrast, young Spanish-dominant bilinguals were 20% more proficient in Spanish than in English based on their self-ratings but were more balanced (only 2% more proficient in Spanish) based on the OPI and the MINT (Gollan et al., Reference Gollan, Weissberger, Runnqvist, Montoya and Cera2012, Table 3). Thus, young Spanish-dominant bilinguals were more balanced than they realized by both objective measures, whereas the older Spanish-dominant's self-rated dominance matched the gold standard measure (the OPI). In another recent study (Stasenko et al., Reference Stasenko, Kleinman and Gollan2021), older bilinguals on average scored significantly higher on the picture naming test (the MINT) in both languages than young bilinguals, but the same young and older bilinguals classified themselves as having equivalent self-rated proficiency level. These discrepancies might reflect between-group differences in standards of excellence, or older adults’ ratings might be lowered by their sense of increasing word-finding difficulties (Burke et al., Reference Burke, MacKay, Worthley and Wade1991; Gollan et al., Reference Gollan, Montoya, Cera and Sandoval2008).

Given the inconsistencies in self-ratings and findings of low predictive power in some studies, it would be of interest to determine if the accuracy of self-report can be increased by combining (or perhaps even replacing) self-ratings of proficiency level with other self-report measures.

Percent (frequency) of language use

In line with Grosjean's complementarity principle (Reference Grosjean1998), where bilinguals are rarely equally fluent in all the languages they know, previous work has shown that frequency of language use is associated with objectively measured proficiency level (Luk & Bialystok, Reference Luk and Bialystok2013). Luk and Bialystok (Reference Luk and Bialystok2013) examined what factors of bilingual experience matter when examining the consequences of bilingualism on language and cognition. They used exploratory and confirmatory factor analyses and derived two factors: self-reported bilingual language use (self-reported speaking and listening skills in the language used at home together loaded on one factor) and English proficiency (measured with the Peabody Picture Vocabulary Task-III, Form A, Dunn & Dunn, Reference Dunn and Dunn1997, and the Expressive Vocabulary Task, Williams, Reference Williams1997; together loaded on a second factor). There was a moderate negative correlation between bilingual language use and English proficiency, suggesting that bilinguals who used a language other than English more often had lower English proficiency than bilinguals who seldom used the other language.

To try to understand the effects of language experience across different social contexts, Gullifer and Titone (Reference Gullifer and Titone2020) derived a measure of distributed language use, called language entropy. Language entropy is computed from composite measures of language use extracted from existing questionnaire data (LEAP-Q, Marian et al., Reference Marian, Blumenfeld and Kaushanskaya2007; or LHQ 2.0, Li et al., Reference Li, Zhang, Tsai and Puls2014). A comprehensive examination of patterns of language use across different contexts in daily life has shown that more distributed language use (a higher entropy score) predicted proficiency in the second language, as measured by self-rating questions condensed into one component, over and above second language age of acquisition and exposure (Gullifer & Titone, Reference Gullifer and Titone2020). However, objective proficiency level was not measured in that study, a gap filled in a subsequent study, which further examined how various aspects of bilingual language use and experience relate to each other (Gullifer et al., Reference Gullifer, Kousaie, Gilbert, Grant, Giroud, Coulter, Klein, Baum, Phillips and Titone2021). Indeed, in that more recent study, a combination of subjective self-report measures and objective measure of verbal fluency in both languages (category and letter fluency tasks in both English and French) were used to define proficiency (Gullifer et al., Reference Gullifer, Kousaie, Gilbert, Grant, Giroud, Coulter, Klein, Baum, Phillips and Titone2021). Bilinguals with high scores in L2 verbal fluency also tended to self-rate themselves higher in the L2, although previously, bilinguals have been found to inaccurately judge their performance in the L2 (Tomoschuk et al., Reference Tomoschuk, Ferreira and Gollan2019). This correlation between objective and subjective ratings in the L2 was however not found in the L1. These findings suggest that self-ratings may depend on the characteristics of the groups sampled, such as their language dominance or language combination (see Tomoschuk et al., Reference Tomoschuk, Ferreira and Gollan2019).

A caveat for use of detailed questionnaires with older participants and in clinical settings is that administration time would likely be even longer (e.g., when working with cognitively impaired individuals). While it is feasible to dedicate this time to collect language history data in a research setting, it is less so the case in a clinical setting where time is primarily spent towards targeted referral questions (e.g., is there cognitive impairment?). Therefore, to optimize knowledge about proficiency from self-report questions, a “Goldilocks zone” must be found between too many and too few language history questions. At this juncture, the field is ripe for investigating which types of questions provide the most predictive power in the least amount of time to meet the demands in clinical settings, and to encourage widespread use of the best predictors in research settings. Across different labs, investigators would be more likely to adopt a set of commonly used questions and objective measures if the time commitment could be kept to a minimum.

Immersion

An important factor that might introduce differences between younger and older bilinguals is that a lifetime of bilingualism provides more time and therefore longer cumulative use of two languages (Gollan et al., Reference Gollan, Montoya, Cera and Sandoval2008), and a lifetime might also provide more opportunities for extended immersion experience. Time spent immersed in an environment where the nondominant language is spoken increases proficiency of that language in young adults, and even temporary and relatively short-lived immersion can have powerful effects (e.g., Linck et al., Reference Linck, Kroll and Sunderman2009; Lynch et al., Reference Lynch, Klee and Tedick2001). Immersion is more efficient for developing skills in a nondominant language compared to learning in a classroom while immersed in the dominant language (Linck et al., Reference Linck, Kroll and Sunderman2009). Immersion may often occur due to immigration, which may take place early in life (e.g., among Heritage language speakers), or later in life. In older bilinguals, this could result in extended periods (decades of immersion) in different languages early versus later in life. Research on the effects of long-term immersion (through immigration) in older adults with and without dementia has shown significant and positive correlations between the number of years immersed in their nondominant language and their proficiency in that language (Nanchen et al., Reference Nanchen, Abutalebi, Assal, Manchon, Démonet and Annoni2017). Specifically, parallel decline of both languages across patient and control groups suggests that living immersed in one's nondominant language can help preserve it to a similar extent as the dominant language, regardless of the severity of cognitive decline experienced in older age (Nanchen et al., Reference Nanchen, Abutalebi, Assal, Manchon, Démonet and Annoni2017).

Outcomes in terms of maintenance of the native language after immigration and development of the majority language in the new country depend on many factors including the language spoken at home and the type of school attended (two-way dual-language immersion versus majority language-only). In a study of Welsh–English bilinguals, adults (parents) who continued speaking Welsh at home, compared to using both Welsh and English, maintained their proficiency levels in Welsh to a larger extent (Gathercole & Thomas, Reference Gathercole and Thomas2009). While more exposure to Welsh in childhood tended to yield lower proficiency in English, these gaps closed when reaching adulthood, across all profiles of language use at home and at school (Gathercole & Thomas, Reference Gathercole and Thomas2009). In this and other studies, the conclusion is that maintenance of the native language and development of the later acquired language depend on quantity of language input in each of these languages (e.g., Gathercole & Thomas, Reference Gathercole and Thomas2009; Hoff, Reference Hoff2018; Hurtado et al., Reference Hurtado, Grüter, Marchman and Fernald2014; Thordardottir, Reference Thordardottir2011). Additionally, quality of exposure also matters: specifically, both exposure (from native versus non-native speakers) and speaker variability have been found to support the development of fluency, both in children (De Cat, Reference De Cat2021; Hoff et al., Reference Hoff, Welsh, Place, Ribot, Grüter, Paradis, Paradis and Grüter2014; Unsworth, Reference Unsworth, Nicoladis and Montanari2016; but see Carroll, Reference Carroll2017) and adults (Linck et al., Reference Linck, Kroll and Sunderman2009; Sinkeviciute et al., Reference Sinkeviciute, Brown, Brekelmans and Wonnacott2019).

The current study

Little attention has been given as to which self-report questions, or combined individual short-lists of questions, are more powerful predictors of objectively measured proficiency level, and whether these might vary across bilingual subgroups formed by language dominance and age group. In the current study, we assessed the joint predictive power of self-rated proficiency level, self-reported frequency of use, and years of immersion for predicting picture naming scores in the nondominant language. We focused on the nondominant language which produced stronger correlations between self-rated proficiency and naming scores in previous studies due to broader ranges of skills in that language (e.g., Garcia & Gollan, Reference Garcia and Gollan2022; Gollan et al., Reference Gollan, Weissberger, Runnqvist, Montoya and Cera2012; Sheng et al., Reference Sheng, Lu and Gollan2014; see also Marian et al., Reference Marian, Blumenfeld and Kaushanskaya2007), while the dominant language tends to be closer to ceiling levels in both self-ratings and objectively measured performance (and therefore is harder to predict). To this end, we analyzed data gathered in several previous studies (see below for details) on younger and older bilinguals where objective language proficiency was measured with the MINT. Using these data, in Experiment 1, we compared English-dominant to Spanish-dominant older bilinguals, and in Experiment 2, we compared younger and older bilinguals. In both experiments we investigated if groups differed systematically in self-rating measures, and if self-rated proficiency level, years of immersion and percent use together predicted nondominant language naming scores better than each predictor on its own.

Experiment 1 – Language Dominance Effects in Older Bilinguals

Methods

Participants

Sixty-eight cognitively healthy older Spanish–English bilinguals for whom item level data on the picture naming test in both languages were readily available from two previous studies (Gollan et al., Reference Gollan, Stasenko and Salmonin press; Gollan & Goldrick, Reference Gollan and Goldrick2016) were selected for analysis. Fifty-nine were tested on the MINT during their yearly evaluation as part of their participation in a longitudinal study at the University of California, San Diego (UCSD) Alzheimer's Disease Research Center (ADRC), and nine were part of a separate research study (Gollan & Goldrick, Reference Gollan and Goldrick2016). The study procedures were approved by the UCSD Institutional Review Board. Participants characteristics are presented in Table 1. Participants were living in San Diego, which is about 15-20 miles from the Mexican border, and in which both English and Spanish are used frequently. About half the bilinguals were English-dominant and therefore were immersed in their dominant language, whereas half were Spanish-dominant bilinguals and immersed in their nondominant language. Seven participants reported having some immersion in a non-English- or non-Spanish-speaking country, ranging from 2 months to 6 years. Classification of language dominance was derived from the average of self-ratings of proficiency level in English and Spanish in four modalities (reading, writing, speaking and listening comprehension) on a 1 to 7 scale. Self-rated proficiency was averaged across the four modalities and whichever average was higher determined which language is dominant. For four bilinguals the average score was the same for the two languages. For two of these we used self-reported percent of current English use to determine dominance; two reported using English more often than Spanish and were thus classified as English-dominant and one reported using Spanish more often than English and was classified as Spanish-dominant. For the remaining balanced participant, percent current English use was at 50%, so we looked at number of years immersed in a non-English speaking country: this number corresponded to the participant's age (89), with zero years immersed in the nondominant language, and therefore we classified this bilingual as English-dominant.

Table 1 Participant characteristics for Experiment 1, sample of older bilinguals divided by language dominance

^a Defined by average self-rated dominance (across reading, speaking, writing and understanding, on a scale from 1 (lowest) to 7 (highest).

^b For the variable of gender, Pearson's Chi-squared test was run instead as the data are categorical.

^c Welch's t-test was used as variances across groups were unequal.

When compared to English-dominant bilinguals, the Spanish-dominant bilinguals had significantly lower education level, and lower picture naming scores in the dominant language (see Materials and Procedure below), but a higher percent use of the nondominant language, and more years of immersion in the nondominant language. Average self-rated proficiency level in the dominant language was at ceiling for both groups, and for the English-dominant group, average self-rated proficiency in the nondominant language tended to be slightly higher than for the Spanish-dominant group, although this difference was not significant (see Table 1).

Materials

Participants named pictures from the MINT, in each of their languages (English and Spanish). The MINT comprises 68 black-and-white pictures that are increasing in difficulty level from beginning to end. If a participant had difficulty recognizing the picture, a semantic cue was provided. If the correct name was produced before or after the semantic cue, it was coded as correct. If the name was not produced at that point, a phonetic cue was provided, and the item was coded as incorrect.

Procedure

The testing session was conducted by a proficient Spanish–English bilingual at the ADRC or in participants’ homes. The MINT was administered towards the end of testing session in which other (unrelated) tasks were administered. For participants tested at the ADRC (n = 59), testing was discontinued after 6 items could not be named, but note that the 9 bilinguals who were tested on the complete test (without discontinuation after 6 failed items) named on average less than 1 additional picture correctly after the point where the discontinuation rule would have been applied (on average just .67 points; SD = 1): thus, we assume this small difference in procedure likely had minimal or no effect. Naming accuracy was recorded simultaneously while testing.

Correlations between the language history variables and MINT naming scores in the nondominant language are summarized in Table 2. The full correlation tables separated by language dominance groups are available in Supplementary Materials, in Appendix SA, Tables SA.1 and SA.2.

Table 2 Pearson's correlations between language history variables and nondominant MINT scores for Experiment 2

Of the three predictors of primary interest, the correlations between self-rated proficiency and MINT scores in the nondominant language were strongest (in both English- and Spanish-dominant bilinguals). Immersion and percent current use were also significantly and positively correlated with nondominant MINT scores in both groups. Age was not a significant predictor, and years of education was significantly correlated with MINT scores, in both dominance groups. However, education was collinear with self-rated proficiency in the nondominant language in English-dominant bilinguals (r = .34, p < .05) and Spanish-dominant bilinguals (r = .61, p < .001), and as such we did not include it in our analyses.

Analyses

We examined the extent to which average self-rated proficiency in the nondominant language, years immersed in the nondominant language, and percent current use of the nondominant language predicted the likelihood of naming a picture accurately in the nondominant language. We examined the joint power of the three predictors in the full sample first, and next each predictor separately (to avoid running overly complex models), with each interacting with language dominance group. As naming accuracy is a binary outcome variable (1 or 0), we analyzed the data using logistic mixed-effects models (lme4 package, Bates et al., Reference Bates, Mächler, Bolker and Walker2015) in R Studio, version 4.2.3 (2023-03-15). To control for the closer similarity of answers within- compared to between-participants, as each provided 68 answers, models included a by-subject random intercept. We also included by-item random intercepts and slopes for each of the between-subjects variables and simplified the random-effects structure when applicable to resolve convergence and singularity issues (Brauer & Curtin, Reference Brauer and Curtin2018). Model assumptions were tested with the DHARMa package (Hartig, Reference Hartig2022), and were satisfied.

Results

Joint predictive power of the three self-report measures

We ran a mixed-effects logistic regression model including the main effects of self-rated proficiency in the nondominant language, years immersed in the nondominant language and percent current use of the nondominant language. The model converged and no singularity issues emerged with the full random-effects structure, including a by-subject and by-item random intercept, and by-item random slopes for each of the three independent variables. The model included 4624 observations. There was a main effect of self-rated proficiency such that the odds of accurately naming an item on the MINT increased by a factor of 3.13 – corresponding to medium effect size of d = .5 (Chen et al., Reference Chen, Cohen and Chen2010) per each additional unit of self-rated proficiency (b = 1.14, SE = 0.15, z = 7.48, OR = 3.13, p < .001). There was also a main effect of years immersed such that the odds of accurately naming an item on the MINT increased by a factor of 1.02 (small effect size) per each additional year immersed (b = 0.02, SE = 0.01, z = 3.01, OR = 1.02, p = .003). The main effect of percent current use was not significant (p = .17, see Supplementary Materials, Appendix SB, Table SB.1a for full results).

To examine whether effects of percent current use of the language might have been obscured by years of immersion, we further explored this variable along with self-rated proficiency level in a subset of the data that included only individuals who reported having no immersion in the nondominant language. Only English-dominant bilinguals fit this profile. The model included 1156 observations and convergence and singularity issues were resolved by removing random slopes and retaining by-item and by-subject random intercepts. The model showed a significant main effect of average self-rated proficiency in the nondominant language such that the odds of accurately naming an item on the MINT increased by a factor of 2.59 for every one unit increase in self-rated proficiency (medium effect size) (b = 0.95, SE = 0.36, z = 2.60, OR = 2.59, p = .009). The main effect of percent current use of the nondominant language was still not significant (p = .57, see Supplementary Materials, Appendix SB, Table SB.1b for full results). In summary, average self-rated proficiency was the strongest predictor of naming accuracy, followed by immersion, and percent current use was not a significant predictor of naming accuracy, even in bilinguals with zero years of immersion. These results did not change when adding education as a covariate in the models.

Next, to determine whether there were significant differences between groups in terms of which variables predicted nondominant MINT accuracy, we looked for interactions between participant group and each of the self-report questions. We ran three mixed-effects logistic regression models including the interaction of participant group (coded as -0.5 for Spanish-dominant and 0.5 for English-dominant) with self-rated proficiency, immersion and percent current use of the nondominant language. Models converged and singularity issues were resolved by including by-subject and by-item random intercepts and removing lower-order random effect terms. Briefly summarized, the models contrasting self-rated proficiency level by group, and years of immersion by group showed significant differences between English-dominant and Spanish-dominant bilinguals, while the model examining self-reported current percent use of the nondominant language showed no difference between groups. These results did not change when adding education as a covariate in the models.

Self-rated proficiency level by language dominance group

The model predicting by-item MINT accuracy from the interaction of self-rated proficiency by participant group included 4624 observations (see Figure 1). There was a main effect of self-rated proficiency such that the odds of accurately naming an item on the MINT increased by a factor of 3.46 (medium effect size) per each additional unit of self-rated proficiency (b = 1.24, SE = 0.13, z = 9.16, OR = 3.46, p < .001). There was also a main effect of participant group such that at each level of self-rated proficiency (which ranged from 1-7 in these bilinguals), Spanish-dominant bilinguals named more pictures correctly in their nondominant language than did English-dominant bilinguals, by a factor of 0.34 (small effect size) (b = -1.07, SE = 0.40, z = -2.67, OR = 0.34, p = .008). The interaction was not significant (p = .54, see Supplementary Materials, Appendix SB, Table SB.2).

Figure 1. Experiment 1 - Predicted proportion correct on MINT accuracy in the nondominant language by average self-rated proficiency level in the nondominant language, across language dominance groups. Raw and predicted data are superimposed; grey ribbons show standard error.

Years of immersion by language dominance group

The model predicting by-item MINT accuracy from the interaction of years of immersion by participant group included 4624 observations. There was a main effect of immersion such that the odds of accurately naming an item on the MINT increased by a factor of 1.15 (small effect size) per each additional year immersed (b = 0.14, SE = 0.04, z = 3.88, OR = 1.15, p < .001). Given the same average number of years immersed, English-dominant bilinguals named more pictures correctly in the nondominant language compared to Spanish-dominant bilinguals, by a factor of 63.43 (very large effect size) (b = 4.15, SE = 1.25, z = 3.31, OR = 63.43, p < .001). The interaction of participant group and years of immersion was significant, such that for both groups, each additional year of immersion increased accuracy, but the effect was stronger for English-dominant than it was for Spanish-dominant bilinguals (b = 0.17, SE = 0.07, z = 2.37, OR = 1.19, p = .02) (see Supplementary Materials, Appendix SB, Table SB.3). Follow-up comparisons suggested that the odds of accurately naming an item on the MINT for the English-dominant group increased by a factor of 1.25 for any additional year of immersion, while they increased by a slightly smaller factor of 1.06 for the Spanish-dominant group (see Figure 2) (both small effect sizes).

Figure 2. Experiment 1 - Predicted proportion correct on MINT accuracy in the nondominant language by the number of years immersed in the nondominant language, across dominance groups. Raw and predicted data are superimposed; grey ribbons show standard error.

To consider if the interaction was robust to a control for between-group differences in education level (see Table 1), we ran a model where only participants with 12 years or more of education were included (which included 35 English-dominant and 20 Spanish-dominant bilinguals who did not differ significantly in education level, t = -0.36, p = 0.72). Twelve years corresponds to completing high school, a level more commonly shared between undergraduate college students and the older adults in this sample. This matched analysis revealed highly similar point estimates, standard errors and z-statistics as in the model with all participants (see Supplementary Materials, Appendix SB, Table SB.4).

Percent current language use by language dominance group

The model predicting by-item MINT accuracy from the interaction of percent current language use by participant group included 4624 observations. There was a main effect of percent current language use such that the odds of accurately naming an item on the MINT increased by a factor of 1.07 (small effect size) per each additional percentage point of use (b = 0.07, SE = 0.02, z = 4.25, OR = 1.07, p < .001). The main effect of language dominance (p = .47) and the interaction of percent current language use and language dominance (p = .50) were not significant (see Supplementary Materials, Appendix SB, Table SB.5).

Discussion

The results of Experiment 1 revealed several key findings, including some significant differences between English-dominant and Spanish-dominant older bilinguals. First, average self-rated proficiency level was the most powerful predictor of objectively measured proficiency in the nondominant language using picture naming scores in the overall sample. We also found significant correlations between years of immersion and naming scores, and relative to Spanish-dominant bilinguals, each year of immersion had a stronger effect on English-dominant bilinguals, who had relatively few years of immersion (and many English-dominant bilinguals had no immersion experience). Although self-reported percent use of the nondominant language was also correlated with naming scores, it did not explain additional variance when jointly predicting naming scores along with self-rated proficiency level (even when considering individuals with no immersion experience).

In addition, self-rated proficiency level seemed to be a powerful predictor, but at each level of self-rated proficiency, Spanish-dominant bilinguals outperformed English-dominant bilinguals in naming scores in the nondominant language, i.e., Spanish-dominant bilinguals scored higher in English than English-dominant bilinguals scored in Spanish (see Figure 1). This outcome could reflect differences in how different groups interpret the self-rating scale, or a systematic bias in the MINT (e.g., if the test were objectively easier in English than in Spanish). The latter seems unlikely given that the MINT was developed from the ground up as a naming test for multiple languages (with English and Spanish among them). We defer further discussion of these possibilities to the General Discussion but note that other clear inaccuracies in the rating scales were apparent. For example, seven bilinguals rated their proficiency in the nondominant language between 1 (“very poor”) and 2.25 (with 2 corresponding to “poor”), thus considering themselves functionally monolingual. Four of these participants named between 11 and 20 pictures in the nondominant language, but three named more than a third of the pictures correctly, a number that clearly exceeds what most would consider “monolingual” and revealing another inaccuracy in the use of self-rated proficiency level.

In Experiment 2, we investigated age group differences while also further examining the effects of immersion in a larger number of bilinguals (adding young bilinguals allowed us to include many more participants from a larger set of available previous studies that did not include older bilinguals).