Examining Gender-Related Differential Item Functioning Using Insights from Psychometric and Multicontext Theory

doi:10.1017/CBO9780511614446.008

7 - Examining Gender-Related Differential Item Functioning Using Insights from Psychometric and Multicontext Theory

Published online by Cambridge University Press: 05 June 2012

Allan S. Cohen and

Robert A. Ibarra

Edited by

Ann M. Gallagher and

James C. Kaufman

Show author details

Allan S. Cohen: Affiliation:
The University of Georgia
Robert A. Ibarra: Affiliation:
The University of New Mexico
Ann M. Gallagher: Affiliation:
Law School Admissions Council, Newton, PA
James C. Kaufman: Affiliation:
California State University, San Bernardino

Book contents

Get access

Summary

Why do men and women tend to perform differently on analytical portions of standardized tests? Psycho/social research often speculates that women's performance “might be more affected by such variables as role expectations or unjustified fears of incompetence” (Basinger, 1997, p. 2; see also Sternberg & Williams, 1997). This “unjustified fear” is similar to what Steele and Aronson (1995) call “stereotype threat” found among African American test takers. With a small number of subjects, and in laboratory conditions, Steele and Aronson found significant differences in test scores when they made only small changes in the directions for taking the test and in the explanations given to their subjects. Their research showed that, when African American college-level students were asked to take a test that had no direct consequence for them, their performance was equal to or better than that of majority test takers in the same group. However, when similar groups were told the outcomes of the same tests would affect them academically, performance levels among African American test takers dropped dramatically. According to Steele and Aronson, the perceived stereotypes associated with testing and other lab or classroom performances of women and minorities created this effect. Their findings suggest that hidden variables in the testing environment may have long-term effects on women and minority test takers.

Steele and Aronson's work clearly points to the impact of hidden variables such as these on test scores.

Type: Chapter
Information: Gender Differences in Mathematics
An Integrative Psychological Approach
, pp. 143 - 171

DOI: https://doi.org/10.1017/CBO9780511614446.008 [Opens in a new window]

Publisher: Cambridge University Press

Print publication year: 2004

Access options

Get access to the full version of this content by using one of the access options below. (Log in options will check for institutional or personal access. Content may require purchase if you do not have access.)

References

Ackerman, T. A. (1992). A didactic explanation of item bias, item impact, and itemalidity from a multidimensional perspective. Journal of Educational Measurement, 29(1), 67–91CrossRef Google Scholar

Basinger, J. (1997). Graduate Record Exam is poor predictor of success in psychology. Academe Today, web site of the Chronicle of Higher Education, August 6 <http://www.chronicle.com/chedata/news.dir/dailarch.dir/9708.dir97080603.htm>(8/6/97)

Bolt, D. M. (1999). Psychometric methods for diagnostic assessment and dimensionality representation. Unpublished Ph.D. dissertation. Urbana: University of Illinois

Bolt, D. M., Cohen, A. S., & Wollack, J. A. (2001). A mixture item response model for multiple-choice data. Journal of Educational and Behavioral Statistics, 26(4), 381–409CrossRef Google Scholar

Bolt, D. M., Cohen, A. S., & Wollack, J. A. (2002). Item parameter estimation under conditions of test speededness: Application of a mixture Rasch model with ordinal constraints. Journal of Educational Measurement, 39(4), 331–348CrossRef Google Scholar

Bowen, W. G., & Bok, D. (1998). The shape of the river: Long-term consequences of considering race in college and university admissions. Princeton, NJ: Princeton University Press

Carlton, S. T., & Harris, A. M. (1989). Female/male performance differences on the SAT: Causes and correlates. Paper presented at the annual meeting of the American Educational Research Association, San Francisco

Cohen, A. S., & Bolt, D. M. (in press). A mixture model analysis of differential item functioning. Journal of Educational Measurement

Cohen, A. S., Wollack, J. A., Bolt, D. M., & Mroch, A. A. (2002). A Mixture Rasch Model Analysis of Test Speededness. Paper presented at the annual conference of the American Educational Research Association, New Orleans, LA

Embretson, S. E., & Reise, S. P. (2000). Item response theory for psychologists. Mawhah, NJ: Lawrence Erlbaum

Gallagher, A. (1998). Gender and antecedents of performance in mathematics testing. Teachers College Record. 100, 297–314Google Scholar

Gallagher, A., Morley, M. E., & Levin, J. (1999). Cognitive patterns of gender differences on mathematics admissions tests. Graduate Record Examinations FAME Report, 4–11, Princeton, NJ: Author

Gallagher, A. M., & DeLisi, R. (1994). Gender differences in scholastic aptitude test – mathematics problem solving among high ability students. Journal of Educational Psychology, 86, 204–211CrossRef Google Scholar

Gallagher, A., Morley, M. E., & Levin, J. (1999). Cognitive patterns of gender differences on mathematics admissions tests. Graduate Record Examinations FAME Report, 4–11, Princeton, NJ: Author

Gierl, M. J., Bisanz, J., Bisanz, G. L., & Boughton, K. A. (2002, April). Identifying content and cognitive skills that produce gender differences in mathematics: A demonstration of the DIF analysis framework. Paper presented at the annual conference of the National Council on Measurement in Education, New Orleans, LA

Hall, E. T. (1959). The silent language. Greenwich, CT: Fawcett

Hall, E. T. (1966). The hidden dimension (2nd ed.). New York: Anchor

Hall, E. T. (1974). Handbook for proxemic research. Washington, DC: Society for the Anthropology of Visual Communication

Hall, E. T. (1984). The Dance of Life: The Other Dimension of Time (2nd ed.). Garden City, NY: Anchor Books

Hall, E. T. (1993). An Anthropology of everyday life (2nd ed.). New York: Anchor

Holland, P. W. (1985). On the study of differential item performance without IRT. Proceedings of the 27th annual conference of the Military Testing Association (Vol. 1, pp. 282–287); San Diego

Holland, P. W., & Thayer, D. T. (1988). Differential item performance and the Mantel-Haenszel procedure. In H. Wainer & H. Braun (Eds.), Test validity (pp. 129–145). Hillsdale, NJ: Lawrence Erlbaum Associates

Ibarra, R. A. (1996). Latino experiences in graduate education: Implications for change. Enhancing the Minority Presence in Graduate Education, no. 7. Washington, DC: Council of Graduate Schools

Ibarra, R. A. (2001). Beyond affirmative action: Reframing the context of higher education. Madison: University of Wisconsin Press

Ibarra, R. A., & Cohen, A. S. (1999, February). Multicontextuality: A hidden dimension in testing and assessment. Paper presented at the ETS Invitational Conference on Fairness, Access, Multiculturalism, and Equity (FAME), Princeton, NJ

Li, Y. (2001). Detecting differences in item response as a function of item characteristics. Unpublished masters thesis, Department of Educational Psychology, Madison: University of Wisconsin

Mislevy, R. J., & Verhelst, N. (1990). Modeling item responses when different subjects employ different solution strategies. Psychometrika, 55, 195–215CrossRef Google Scholar

Oshima, T. C. (1994). The effect of speededness on parameter estimation in item response theory. Journal of Educational Measurement, 31, 200–219CrossRef Google Scholar

Pine, S. M. (1977). Application of item characteristic curve theory to the problem of test bias. In D. J. Weiss (Ed.), Application of computerized adaptive testing: Proceedings of a symposium presented at the 18th annual convention of the Military Testing Association (Research Rep. No. 77–1, pp. 37–43). Minneapolis: University of Minnesota, Department of Psychology, Psychometric Methods Program

Ramírez, M., III. (1983). Psychology of the Americas: Mestizo perspectives on personality and mental health. New York: Pergamon

Ramírez, M., III. (1991). Psychotherapy and counseling with minorities: a cognitive approach to individual and cultural differences. New York: Pergamon

Ramírez, M., III. (1998). Multicultural/multiracial psychology: mestizo perspectives in personality and mental health. Northvale, NJ: Jason Aronson

Ramírez, M., III. (1999). Multicultural psychology: an approach to individual and cultural differences (2nd ed.). Needham Heights, MA: Allyn and Bacon

Ramírez, M., III, & Castañeda, A. (1974). Cultural democracy, bicognitive development, and education. New York: Academic Press

Rost, J. (1990). Rasch models in latent classes: An integration of two approaches to item analysis. Applied Psychological Measurement, 14, 271–282CrossRef Google Scholar

Roussos, L., & Stout, W. (1996). A multidimensionality-based DIF analysis paradigm. Applied Psychological Measurement, 20, 355–371CrossRef Google Scholar

Scheuneman, J. D., & Gerritz, K. (1990). Using differential item functioning procedures to explore sources of item difficulty and group performance characteristics. Journal of Educational Measurement, 27, 109–131CrossRef Google Scholar

Steele, C. M., & Aronson, J. (1995). Stereotype threat and the intellectual test performance of African Americans. Journal of Personality and Social Psychology, 69(5), 797–811CrossRef Google Scholar PubMed

Sternberg, R. J., & Williams, W. M. (1997, June). Does the graduate record examination predict meaningful success in the graduate training of psychologists? A case study. American Psychologist, 52(6), 630–41CrossRef Google Scholar

Tatsuoka, K. K. (1985). A probabilistic model for diagnosing misconceptions by the pattern classification approach. Journal of Educational Statistics, 10, 55–73CrossRef Google Scholar

Thissen, D. (1991). MULTILOG [computer program]. Chicago: Scientific Software

Thissen, D., Steinberg, L., & Wainer, H. (1988). Use of item response theory in the study of group differences in trace lines. In H. Wainer & H. I. Braun (Eds.), Test validity (pp. 147–169). Hillsdale, NJ: Erlbaum

Thissen, D., Steinberg, L., & Wainer, H. (1993). Detection of differential item functioning using the parameters of item response models. In P. W. Holland & H. Wainer (Eds.), Differential item functioning (pp. 67–113). Hillsdale, NJ: Erlbaum

Wild, C. L., & McPeek, W. M. (1986). Performance of the Mantel-Haenszel statistic in a variety of situations. Paper presented at the annual meeting of the American Educational Research Association, San Francisco

Yamamoto, K., & Everson, H. (1997). Modeling the effects of test length and test time on parameter estimation using the HYBRID model. In J. Rost & R. Langenheine (Eds.), Applications of latent trait and latent class models in the social sciences. New York:

Zieky, M. (1993). Practical questions in the use of DIF statistics in test development. In P. W. Holland & H. Wainer (Eds.), Differential item functioning (pp. 337–347). Hillsdale, NJ: Erlbaum

Book contents

7 - Examining Gender-Related Differential Item Functioning Using Insights from Psychometric and Multicontext Theory

Summary

Access options

References

Save book to Kindle

Save book to Dropbox

Save book to Google Drive