Background on the LLBC methods
The present work presents a reanalysis of data from LLBC (Oller & Eilers, 2002a). The original study included 952 children (704 bilinguals, 248 monolinguals) at K (332), second (306), and fifth grade (314) selected by staff of the LLBC project prior to testing. The present analysis focuses on the data from the 620 children at second and fifth grade. The reanalyzed data here focus on two of the three grades studied, in part because psychometric limitations of some subtests (owing especially to floor effects) at K complicated comparisons across tests for K. Some Hispanic children at K appeared to have had very little experience in English, so may have been scarcely bilingual at that point. In contrast, second- and fifth-grade tests showed substantial psychometric robustness (with neither floor nor ceiling effects), and all the children in those grades had had notable exposure to English. The second- and fifth-grade data include exactly the same design characteristics as for the LLBC sample as a whole.
Children were preselected such that half were high SES and half low SES, based upon an extensive questionnaire administered to parents. The SES scale was based on a synthesis of widely utilized tools (Hollingshead, 1978; Nam & Powers, 1983), emphasizing mother's educational level and occupational status of the parents. Similarly, half the bilinguals at each grade came from homes where English and Spanish were spoken equally, and half from homes where only Spanish was spoken.
All the children in the study were in one of 10 neighborhood schools. Based on US Census data, it was determined that the bilingual schools were well matched on SES of the neighborhood population, as well as ethnicity, monetary allotment per child from the government in support of the school, and average standard achievement test mathematics scores in K and first grade. Bilingual children were all from schools where near 90% of children were Hispanic, whereas most monolingual children were from schools where approximately 40% were Hispanic. Among the bilingual children half were in two-way programs and half were in English-immersion programs. All had been in these programs throughout their elementary schooling and all were born in the United States.
Educational characteristics of the schools
In the two-way schools 40% of each school day was conducted in Spanish, and 60% in English. In English-immersion programs 90% of instruction was in English. Classroom observations confirmed that teachers in the schools conducted their instruction, including one-to-one comments to individual students during class, in the language designated by the administrative protocol (Eilers, Oller, & Cobo-Lewis, 2002).
Standardized test battery
The standardized battery was drawn from the Woodcock–Johnson and Woodcock–Muñoz language and literacy evaluations normed in both English and Spanish (Woodcock, 1991; Woodcock & Muñoz-Sandoval, 1995) and from the PPVT, a receptive vocabulary test, and its Spanish normed companion, the TVIP (Dunn & Dunn, 1981; Dunn et al., 1986). These tests required children on each trial to point to one of four pictures in response to a label spoken by the tester. The subtests of the Woodcock battery that were included were word attack (WA), letter–word recognition (LW), passage comprehension (PC), proofreading, dictation, picture vocabulary (PV), verbal analogies (VA), and oral vocabulary (OV). For WA, children were required to read nonsense words, constructed according to the phonotactics of the language, and for LW, they were required to read real words. WA and LW were the tests used to assess “phonics,” which can be thought of here as the skill domain requiring mapping of orthographic (or “graphemic”) symbols to phonemic elements in each language. The WA test is particularly useful as an assessment of pure phonics (because the “words” to be read are novel to the child and cannot have been memorized as Gestalt orthographic patterns). The LW test presumably assesses a combination of pure phonics skill along with memory for Gestalt patterns of real words. The PC test required children to read a passage and answer questions about it. The proofreading task required children to supply corrections to brief segments of written text, and dictation required children to write words, phrases or sentences as they were presented by the examiner. PV was a picture naming task. VA required children to fill in the blanks in sequences such as “fish is to swim as bird is to___.” OV required children to supply synonyms or antonyms in each of two subsections.
Every child was tested on the batteries indicated above plus several additional evaluations that were constructed specially for the research (see LLBC, chapters 7– 11, Oller & Eilers, 2002a). Bilingual children had four to five sessions of testing on separate days, 30–40 min in length, and order of administration of the Spanish and English batteries was counterbalanced. Monolingual children required fewer test sessions because they were tested in English only. All testers were trained in administration of the standardized instruments in accord with recommendations of the test developers, and all were thoroughly bilingual in English and Spanish.
Additional details on the study's procedures can be found in the original volume (Cobo-Lewis et al., 2002a, 2002b; Eilers et al., 2002; Oller & Eilers, 2002b).
Rationale for theoretical expectations
In keeping with our supposition that profile effects in bilingual learners can be explained in terms of differential degrees to which the distributed characteristic applies in different domains of language and literacy, we propose specific predictions listed in the Research Propositions Section. To understand the predictions regarding vocabulary, consider the fact that different circumstances of living correspond to different topics of conversation (and consequently to different semantic domains or interactive scripts) along with particular vocabulary items corresponding to the different topics. If the circumstances of life are distributed such that the L1 is spoken systematically in some of them and the L2 is spoken systematically in others, then vocabulary learning distributed across the two languages is a predictable outcome. The nature of the distribution of vocabulary in such cases would tend to be such that certain semantic domains (e.g., objects found in the kitchen or objects found on the school playground) would be more well known by individual bilingual speakers in one language than the other.
To understand why the distributed characteristic applies to vocabulary, but why we think it does not apply (or does not apply strongly) in other domains of language or literacy, consider certain structural characteristics of vocabulary. (a) There are thousands of root words (or morphemes) in every natural language. (b) These root words must essentially be learned throughout life, one by one. To command translation equivalents, bilingual learners are required (with the exception of some cognate words if the two languages are related) to learn each translation-equivalent root word twice, once for each language. (c) Each translation-equivalent word pair represents a coupling of two different phonological or orthographic forms (a sequence of phonemes or letters), with a particular meaning (a semantic content). (d) One does not need to know all the vocabulary in a language to function effectively in that language in certain circumstances. If a particular circumstance only requires communicating about a limited range of topics and corresponding semantic domains, there is no need to possess certain vocabulary, namely, that vocabulary not pertaining to the topics and domains required for communication in that circumstance. These structural characteristics of vocabulary set the stage for the distributed characteristic of bilingual vocabulary learning, because bilingual children learn some root words in one language without needing to learn them in the other.
Other aspects of language and literacy do not show these structural characteristics. The skill that maximally contrasts with vocabulary in our data (in terms of test scores) is phonics, and our contention is that phonics scores are not low in either language of bilinguals compared to monolinguals at least in part because the distributed characteristic does not play a significant role (if any) in phonics for Spanish and English bilinguals. In the following we present the reasoning behind our proposal that phonics does not show the distributed characteristic because phonics does not possess the structural characteristics that set the stage for it.
Here we use the term “phonics” to refer to the set of individual pairings of phonemes with letters that occur in a particular writing system. Any individual combination of a phoneme with a letter (or letters) that can represent it will be called a “phonics pairing.” We assume these pairings to be the basic elements of phonics just as we assume root words (or morphemes) to be the basic elements of vocabulary.
Note now that phonics pairings differ from root words on all the structural characteristics listed above. (a) Although root words number in the thousands in any language, phonics pairings are much more limited in number. In Spanish, where the orthography is largely transparent, there are about 30 phonics pairings, whereas in English, the opaque orthography produces a number of pairings that is several times that large, but still minuscule by comparison with the number of root words in a language. Although vocabulary learning continues throughout life to acquire many thousands of items, phonics learning can be completed early in acquisition because the set of phonics pairings is relatively small. (b) The great bulk of phonics pairings in any alphabetical language must be acquired (either tacitly or explicitly) very early in learning to read, because productive reading in an alphabetical language requires command of them (Liberman, Shankweiler, & Liberman, 1989; Treiman, 2000). (c) Although each vocabulary item consists of a phonological (orthographic) form (a sequence of phonemes or letters) that represents a meaning, each phonics pairing is in and of itself meaningless. Individual phonemic and orthographic elements serve merely as vehicles for transmission of meaning, possessing no linguistic meaning of their own (de Saussure, 1968; Hockett, 1977). Consequently, the notion “translation equivalent,” which is so important in understanding the distributed characteristic in vocabulary, has no obvious parallel in phonics; there are no meanings to translate from one language to the other in phonics, because the elements of phonics are meaningless. Equivalency of phonics pairings across languages is consequently different in kind from translation equivalency in vocabulary, and undercuts sorting of phonics pairings as singlets and doublets, a critical feature of the analysis of the distributed characteristic in vocabulary. (d) Although an individual can function well in a language with limited vocabulary as long as that vocabulary covers the necessary semantic domains required for the circumstances where the individual uses the language, the same is not true of phonics. When reading in any semantic domain, in any alphabetical language, essentially the totality of phonics pairings may be invoked, and this fact makes it critical that the number of phonics pairings be relatively small and learnable in toto. Regardless of the topic or semantic content one reads about, there is no reason to expect to find a particular range of phonics pairings, because the phonics pairings are themselves semantically void, by design (Studdert-Kennedy, 2000). Thus, although circumstance specificity can naturally drive segregation of vocabulary in a bilingual's two languages because particular vocabulary items can pertain to particular semantic domains, the same cannot naturally happen in phonics. Phonics pairings are inherently not semantic-domain specific, so no matter where reading is done or how it is learned, a similar range of phonics pairings should always be invoked, according to our reasoning.3
One might point out that there may exist circumstance specificity for learning of phonics or for the act of reading. For example, school may be the setting for learning of phonics and the primary setting for reading. If circumstance specificity produces the distributed characteristic in vocabulary, why not in phonics? For circumstance specificity to produce a distributed characteristic in phonics, it would have to meet several additional conditions: (a) learning would have to be focused on particular phonics pairings in one language but not the other; (b) these phonics pairings would have to be in some sense equivalent across the languages; (c) the knowledge of the pairings would have to remain exclusively in one language and not transfer to the other; and (d) the learner would have to be prevented from learning the pairings in the other language, even though they would presumably be needed for reading on any topic or in any semantic domain. If our reasoning about differences between the structural characteristics of vocabulary and phonics is correct, these conditions would be very unlikely ever to be met.
Assuming now that these structural differences are correctly formulated and empirically accurate, there is no straightforward basis upon which a distributed characteristic could develop in phonics. The fact is that it is hard to envision a way that distributed learning could ever produce low scores in phonics for English and Spanish learners.
Consider an imaginary example. The letter “e” occurs in both English and Spanish. For a child who is learning to read in both languages, a distributed characteristic for phonics with regard to the letter “e” might be thought to require exposure to the phonics pairings for the letter “e” in one language but not in the other. The child might learn then to sight read words in English involving the letter “e” (real words like “need,” “feet,” or “bread,” or nonsense words like “keat,” “jeel,” “kreach,” etc.) but, to show a distributed characteristic, would have to not learn to read words in Spanish involving the letter “e” (real words like “perro,” “cerdo,” or “creo” or nonsense words like “tero,” “breu,” or “quespo”). Conversely, the child would have to learn the requisite pairings in Spanish but not English.
However, learning of the pairings for “e” in one language but not the other based on differential exposure would be extremely unnatural, because as explained above, one has to learn all the letters and their pairings to be able to read at even the most basic level in either language in any semantic domain. If one cannot decode the letter “e” in either language, one is handicapped in a very general way with regard to reading in that language.
Further, many phonics pairings in English and Spanish, once learned in either language, would appear to be quite transferable to the other. Pairings for the letters “f, s, m, n, y, and l” are extremely similar across the languages, and pairings for “p, t, k, c, b, d, g, and ch” differ only in minimal ways. Consequently, knowledge for these phonics pairings can hardly be distributed across the languages the way vocabulary knowledge can be, because once learned in either language, the phonics pairings may become immediately available by generalization in the other. Even vowel and diphthong pairings show substantial commonalities across the languages and may produce important transfer. Letters or letter sequences that clearly pertain to one language's phonics but not to that of the other (e.g., “ñ” in Spanish or “th” in English) cannot be acquired in either language in such a way as to contribute to a distributed characteristic the way we have defined it. These phonics pairings in one language have no equivalent pairings in the other, and thus cannot be sorted in terms of whether or not the learner acquires the equivalent structure in both languages; these structures have no equivalents across languages, and hence no possible doublets. The distributed characteristic as we observe it in bilingual vocabulary depends on the existence of very large numbers of cases where words have (translation) equivalents in the two languages. With translation equivalents, a word learned in one language (a singlet) can occupy a semantic domain that can be left empty by the learner in the other language because it may not be needed in the circumstances of usage of the other language for the individual learner.
Thus, in accord with Research Proposition 1, we propose that vocabulary scores of bilingual children may tend to be low with respect to monolinguals in both the L1 and L2 because bilingual vocabulary is selectively distributed. Phonics scores tend to be higher in English–Spanish bilinguals (who thus may not trail monolinguals in the L1 or L2), because, according to our reasoning, there is presumably no possibility of similarly distributed learning in phonics.
We speculate further, in accord with Research Proposition 2, that tests of vocabulary reasoning, such as VA or synonym/antonym determination (OV), should also show intermediate standard scores in bilingual children, with scores falling between those for phonics and vocabulary. The basis for this expectation requires that we envision a combination of two skills, one involving retrieval of vocabulary items from memory, a skill reflected directly in scores on pure vocabulary tests, and another skill involving reasoning about vocabulary once it has been retrieved, a skill that is not directly reflected in behavioral tests, but must be inferred from comparing scores on vocabulary reasoning tests and pure vocabulary tests. Although retrieval of vocabulary items may be more limited for bilingual children within both the L1 and L2 than for monolinguals, the ability to reason about vocabulary items once they have been successfully retrieved should be equivalent for bilingual and monolingual children, we presume. Thus, scores on vocabulary reasoning tests are expected to be higher than for vocabulary tests alone (because reasoning itself should be unimpaired), but are expected even so to show limitations with respect to monolingual performance, because vocabulary retrieval is predictably more limited in bilinguals than monolinguals owing to the distributed characteristic.
We speculate also, in accord with Research Proposition 3, that tests requiring both vocabulary and phonics skills, tests such as reading comprehension, for example, should show standard scores intermediate between those of phonics and vocabulary in bilingual children, because, again, the combination of the two skills may tend to mitigate the extremes. Reading/writing (a composite of PC, proofreading, and dictation) scores would be predictably higher in bilinguals than vocabulary scores because the scores are influenced by phonics skills, which tend to be high. At the same time reading/writing scores would be predictably lower in bilinguals than phonics scores because the scores are influenced by vocabulary skills, which tend to be low.
It is anticipated that second-grade profile effects will be stronger than fifth-grade effects (Research Propositions 1–3). This expectation is based on change in the distributed characteristic across time. About 70% of vocabulary in 2-year-old bilinguals in Miami consisted of singlets (Pearson et al., 1999), but the number dropped to about 40% in elementary school. By college the rate was below 20% according to Pearson et al.'s analysis. We predict that as the proportion of singlets decreases (and it is reasonable to expect that it decreases from second to fifth grade), the relative influence of low vocabulary scores on profile effects in either language should similarly decrease.
In addition, in each case it is anticipated that the bilingual profile effects will be stronger in Spanish than in English (Research Propositions 1–3) because data from LLBC indicated the bilingual children were more competent in English vocabulary than Spanish by second grade and preferred to speak English (LLBC, chapters 3– 5, Oller & Eilers, 2002a). With higher English vocabulary scores, the profile difference of vocabulary with respect to phonics scores for bilingual children is naturally anticipated to be lower in English than in Spanish.
Comparisons of composite scores
For each research proposition the primary comparisons of interest concern relative performance on the various tests as reflected by standard scores. To the extent that the scores differ reliably across tests, the existence of profile effects will be confirmed. For each child composite scores were calculated as linear combinations of subtests to represent focused areas of language or literacy. Table 1 presents the weights by which subtests (shown in the columns) were combined to produce the composite scores (in the rows). The profile effects that we analyzed for this article constituted differences among these composite scores (e.g., basic reading vs. basic vocabulary).
Our exposition of the analysis below involves two steps: we report effect sizes based on standard score comparisons across the tests and take note of effect consistency in a semiformal manner, referencing the sign test, and we conduct formal statistical tests of the profile effects utilizing the very conservative Scheffé test. We now turn to descriptions of the two steps.
Standard score interpretation and adjustment
One way to evaluate bilingual profile effects is simply to compare standard scores of bilingual children to the expected mean of 100 for every test. A pattern of relatively higher and lower scores on different tests can be interpreted as a profile effect with respect to the (overwhelmingly monolingual) norming group. The eight bilingual subgroups of the design were orthogonal; this means that every child's data were represented in one and only one subgroup. Consequently, each left–right pair of open symbols in Figures 3–6 represents an independent test of the profile effects in question. In Figure 3, for instance, because all eight bilingual comparisons favor a profile effect, the result (if it were not post hoc) could be legitimately interpreted as statistically significant even by a sign test (p=.008). If seven out of eight cells show a particular profile, the sign test could be interpreted as approaching statistical significance (p=.070).
Profile effects for basic reading and basic vocabulary. For second grade (top row) and fifth grade (bottom row), open symbols show means for each of the eight cells in the LLBC design (Oller & Eilers, 2002a) for bilingual children for basic reading (word attack and letter–word recognition) and basic vocabulary (picture vocabulary and Peabody Picture Vocabulary Test). Filled symbols show unweighted means across the cells for all the bilingual children as well as for all monolingual controls. Data from monolingual controls are available only for English tests (left column), not for Spanish tests (right column). The vertical line at 100 indicates the expected mean for the norming sample on all tests; mono, monolingual mean across low SES and high SES; bi, bilingual mean across all subgroups; Eng Imm, English-immersion school type; 2-way, two-way school type; Span, only Spanish spoken at home; Eng & Span, equally English and Spanish spoken at home; Low, low SES; High, high SES
Profile effects for word attack, vocabulary reasoning (oral vocabulary and verbal analogies subtests of Woodcock–Johnson and Woodcock–Muñoz), and picture naming. For second grade (top row) and fifth grade (bottom row), open symbols show means for each of the eight cells in the LLBC design (Oller & Eilers, 2002a) for bilingual children for word attack, vocabulary reasoning, and picture naming. Filled symbols show unweighted means across the cells for all the bilingual children as well as for all monolingual controls. Data from monolingual controls are available only for English tests (left column), not for Spanish tests (right column). The vertical line at 100 indicates the expected mean for the norming sample on all tests; mono, monolingual mean across low SES and high SES; bi, bilingual mean across all subgroups; Eng Imm, English-immersion school type; 2-way, two-way school type; Span, only Spanish spoken at home; Eng & Span, equally English and Spanish spoken at home; Low, low SES; High, high SES
Profile effects for word attack, reading/writing (passage comprehension/2 + proofing/4 + dictation/4), and picture naming. For second grade (top row) and fifth grade (bottom row), open symbols show means for each of the eight cells in the LLBC design (Oller & Eilers, 2002a) for bilingual children for word attack, reading and writing, and picture naming. Filled symbols show unweighted means across the cells for all the bilingual children as well as for all monolingual controls. Data from monolingual controls are available only for English tests (left column), not for Spanish tests (right column). The vertical line at 100 indicates the expected mean for the norming sample on all tests; mono, monolingual mean across low SES and high SES; bi, bilingual mean across all subgroups; Eng Imm, English-immersion school type; 2-way, two-way school type; Span, only Spanish spoken at home; Eng & Span, equally English and Spanish spoken at home; Low, low SES; High, high SES
Profile effects for receptive vocabulary (Peabody Picture Vocabulary Test) and picture naming (picture vocabulary subtest of Woodcock–Johnson and Woodcock–Muñoz tests). For second grade (top row) and fifth grade (bottom row), open symbols show means for each of the eight cells in the LLBC design (Oller & Eilers, 2002a) for bilingual children for receptive vocabulary and picture naming. Filled symbols show unweighted means across the cells for all the bilingual children as well as for all monolingual controls. Data from monolingual controls are available only for English tests (left column), not for Spanish tests (right column). The vertical line at 100 indicates the expected mean for the norming sample on all tests; mono, monolingual mean across low SES and high SES; bi, bilingual mean across all subgroups; Eng Imm, English-immersion school type; 2-way, two-way school type; Span, only Spanish spoken at home; Eng & Span, equally English and Spanish spoken at home; Low, low SES; High, high SES
However, we treated the sign test approach with caution because it was post hoc and included no correction for multiple comparisons of tests (a more rigorous statistical evaluation is described below). Furthermore, it was important to consider the standard scores for bilingual children on the English tests in light of differences among scores across tests in the monolingual children evaluated in the same study. Although one would expect the standard scores for monolinguals to approximate the normed mean of 100 on all standardized tests, any particular cohort could differ from 100 due to an unexpected difference between the particular cohort and the norming sample. Consequently, another comparison considered below was based on subtraction of the monolingual profile effects from the bilingual ones. For example, the differences for bilinguals on basic reading and basic vocabulary scores in English were compared to monolinguals' score differences. Thus, if bilinguals scored 10 points higher in basic reading than basic vocabulary, but monolinguals scored 2 points higher in basic reading, the net profile effect for the bilinguals after subtraction would be 8 points. The bilingual profile effects that remained after subtraction for each of the eight subgroups could then be evaluated by the sign test, although we reiterate that these evaluations would be post hoc and uncorrected for multiple comparisons.
Score adjustment (implemented by subtracting monolingual from bilingual profile effects) was necessary because there were cases where apparent profile effects in the bilinguals could be the result, not of special characteristics of bilingual learning, but of a special characteristic of schooling in the location of sampling that could have affected both bilingual and monolingual patterns of performance with respect to the normed scores. In fact, both monolingual and bilingual children did show standardized scores on particular tests that were systematically different from 100. For example, both monolingual and bilingual children in some comparisons showed especially high scores on phonics tests with regard to the expected/normed mean of 100, while not showing such high scores on other tests. We speculate that the high scores on phonics may have been the result of a greater emphasis on teaching phonics in Miami during the period of LLBC testing (Oller & Eilers, 2002a) than during the time of the norming of the tests. In cases where both monolingual and bilingual children showed elevated scores on phonics with respect to the expected mean of 100, and both showed lower scores on other tests, profiles of difference favoring phonics over other test scores were evident for both groups with regard to the normed scores. However, as will be seen, the bilingual profile effects were larger and more consistent than monolingual profile effects. This conclusion can be seen to meet the sign test standard in any case where all eight bilingual subgroups show profile effects even after subtraction of monolingual profile effects.
Formal evaluation of the profile effects
Multivariate analysis of variance (the primary method of analysis in LLBC, Oller & Eilers, 2002a) is not ideally suited to the formal analysis goals of the present study. The goal here was to reanalyze the data from second and fifth grade conservatively while focusing on a number of comparisons of subgroups of children in both English and Spanish, and to illustrate the extent of and reliability of profile effects for each subgroup. Consequently, we analyzed the data in formal evaluations for each hypothesis and subhypothesis by parametric Scheffé tests, as explained below. This approach provides an optimal fit with the goal of the analysis to focus on profile effects within each subgroup of children and language at both second and fifth grade.
In our formal statistical evaluations, each examined profile effect amounted to a specific single degree of freedom contrast among levels of test type (test), representing any composite score. A very rigorous statistical adjustment for profile effects in monolinguals was made. Examining the difference in profile effects for bilinguals and monolinguals is equivalent to examining the statistical interaction of each profile effect with lingualism (monolingual vs. bilingual). If the interaction of a bilingual profile effect by lingualism was significant in any case, it indicated that the bilingual effect was significant after correction for (after subtracting) the monolingual effect.
For the Spanish tests, no comparison with monolingual profiles was available, because no Spanish monolinguals were tested in LLBC (Oller & Eilers, 2002a) on the standardized battery (monolingual Spanish subjects meeting the study preselection criteria were not available in the schools). Consequently, the data on Spanish are presented without comparison to a monolingual control group (i.e., we examined single degree of freedom contrasts in Spanish but could not examine the interaction of test by lingualism). In general, the profile effects in Spanish were even stronger than in English (verified by examining the interaction of profile effects with tested language, and by considering effect sizes in standard deviation units of difference between test scores for Spanish and English), suggesting that the Spanish-language profile effects were real and were not the result of teaching styles that may have been different in Miami during the period of LLBC testing from the way they were for children in schools where and when the tests were normed. The formal statistical evaluation chosen here is much more conservative than multivariate analysis of variance as reported in LLBC. As a primary assessment of statistical significance of the profile effects, we utilized parametric Scheffé post hoc tests that allowed examination of any linear combination of the nine subtests in the LLBC study while guaranteeing familywise an α value below 0.05. This provided a very rigorous control for post hoc examination of the data. For an α value of 0.05, the critical value of the Scheffé t for examining any combination of nine subtests was never less than ±3.94 and it was always more than twice the critical value of the t statistic for planned comparisons with equivalent degrees of freedom. We were thus able (in accord with this conservative method) to declare significant only those effects that were more than twice as large as would be required to achieve significance with planned comparisons. This test is so conservative that even if only one of the many comparisons to be made with the Scheffé test proved to be statistically significant, it would demonstrate incontrovertibly that a multivariate analysis of variance for the dataset would also have yielded a significant F value.
In a further commitment to rigor, we also used separate error terms for each profile effect to avoid having to make any assumption of sphericity. Finally, in performing the post hoc tests of the profile effects and their interactions, we statistically controlled for other effects in the design by partialing out, as appropriate, any main effects and/or interactions among IMS, LSH, and SES, and first language tested (this last effect pertained only to bilinguals, approximately half of whom took the Spanish battery first and the remainder of whom took the English battery first).
Scheffé tests determined statistical significance by evaluating the reliability or consistency of effects. However, there is always a distinction to be drawn between reliability of an effect, and effect size, that is, the amount by which test scores differed in terms of standard scores or standard deviations from the expected (or normed) mean scores; a score 15 points above or below the expected/normed mean of 100 represents a full standard deviation of difference on the tests utilized here. Because of sampling error and especially because of differential covariance among tests in various profile effects, the sample standard deviation of some interactions involving profile effects in this paper ranged as low as 10 points or as high as 26 points. Consequently, effect sizes are reported below in terms of Cohen's d, a corrected effect size.