Self-ratings of language proficiency are ubiquitous in research on bilingualism, but little is known about their validity, especially when the same scale is used across different types of bilinguals. Self-ratings and picture naming data from 1044 Spanish–English and 519 Chinese–English bilinguals were analyzed in five between- and within-population comparisons. Chinese–English bilinguals scored more extremely than Spanish–English bilinguals, and in opposite directions at different endpoints of the self-ratings scale. Regrouping bilinguals by dominant language, instead of language membership, reduced discrepancies but significant group differences remained. Population differences appeared even in English, though this language is shared between populations. These results demonstrate significant problems with self-ratings, especially when comparing bilinguals of different language combinations; and subgroups of bilinguals who speak the same languages but vary in acquisition history and/or dominance. Objective proficiency measures (e.g., picture naming or proficiency interviews) are superior to self-ratings, to maximize classification accuracy and consistency across studies.