Fitting Logistic IRT Models: Small Wonder

Miguel A. García-Pérez

doi:10.1017/S1138741600005473

Fitting Logistic IRT Models: Small Wonder

Published online by Cambridge University Press: 10 April 2014

Miguel A. García-Pérez

Show author details

Miguel A. García-Pérez*: Affiliation:
Complutense University of Madrid
*: Correspondence concerning this article should be addressed to Dr. Miguel A. García-Pérez, Departamento de Metodología. Facultad de Psicología.Universidad Complutense. Campus de Somosaguas. 28223 Madrid (Spain). Phone: (+34) 91 394 3061. Fax: (+34) 91 394 3189. E-mail: miguel@psi.ucm.es

Article contents

Abstract
References

Get access

Rights & Permissions

Abstract

State-of-the-art item response theory (IRT) models use logistic functions exclusively as their item response functions (IRFs). Logistic functions meet the requirements that their range is the unit interval and that they are monotonically increasing, but they impose a parameter space whose dimensions can only be assigned a metaphorical interpretation in the context of testing. Applications of IRT models require obtaining the set of values for logistic function parameters that best fit an empirical data set. However, success in obtaining such set of values does not guarantee that the constructs they represent actually exist, for the adequacy of a model is not sustained by the possibility of estimating parameters. This article illustrates how mechanical adoption of off-the-shelf logistic functions as IRFs for IRT models can result in off-the-shelf parameter estimates and fits to data. The results of a simulation study are presented, which show that logistic IRT models can fit a set of data generated by IRFs other than logistic functions just as well as they fit logistic data, even though the response processes and parameter spaces involved in each case are substantially different. An explanation of why logistic functions work as they do is offered, the theoretical and practical consequences of their behavior are discussed, and a testable alternative to logistic IRFs is commented upon.

La función de respuesta al ítem (FRI) asumida en los modelos al uso en teoría de respuesta al ítem (TRI) es, en la práctica, exclusivamente la función logística. Las funciones logísticas cumplen los requisitos de que su rango es el intervalo [0, 1] y son monótonamente crecientes, pero imponen un espacio paramétrico cuyas dimensiones sólo tienen una interpretación metafórica en el contexto de la evaluación mediante pruebas objetivas. La aplicación de modelos TRI requiere la estimación de los parámetros logísticos que mejor describen unos datos empíricos. Sin embargo, el éxito en la obtención de estos parámetros no garantiza que los constructos representados mediante ellos existan en realidad, puesto que la validez de un modelo no queda establecida sólo por la posibilidad de estimar sus parámetros. Este trabajo muestra que la adopción mecánica de funciones logísticas como FRI en modelos TRI produce estimaciones y ajustes estereotipados. Como prueba, se presentan resultados de un estudio de simulación en el que el modelo logístico produjo un patrón de estimaciones y ajustes de datos no logísticos que fue indistinguible del patrón obtenido para datos logísticos, a pesar de que los datos no logísticos se generaron de acuerdo con un modelo que implica un proceso de respuesta y un espacio paramétrico marcadamente diferentes del logístico. El trabajo termina con unas reflexiones acerca de las razones por las que los modelos logísticos se comportan así y de las consecuencias teóricas y prácticas de ese comportamiento, y también se describe una alternativa empíricamente falsable a las FRI logísticas.

Keywords

goodness of fit parameter estimation item response theory logistic models finite state polynomic models BILOG bondad de ajuste estimación de parámetros teoría de respuesta al ítem modelos logísticos modelos polinómicos de estados finitos BILOG

Type: Spanish research trends
Information: The Spanish Journal of Psychology , Volume 2 , May 1999 , pp. 74 - 94

DOI: https://doi.org/10.1017/S1138741600005473 [Opens in a new window]
Copyright: Copyright © Cambridge University Press 1999

Access options

Get access to the full version of this content by using one of the access options below. (Log in options will check for institutional or personal access. Content may require purchase if you do not have access.)

References

Albanese, M.A. (1988). The projected impact of the correction for guessing on individual scores. Journal of Educational Measurement, 25, 149–157.CrossRef Google Scholar

Ansley, T.N., & Forsyth, R.A. (1985). An examination of the characteristics of unidimensional IRT parameter estimates derived from two-dimensional data. Applied Psychological Measurement, 9, 37–48.CrossRef Google Scholar

Baker, F.B. (1987a). Methodology review: Item parameter estimation under the one-, two-, and three-parameter logistic models. Applied Psychological Measurement, 11, 111–141.CrossRef Google Scholar

Baker, F.B. (1987b). Item parameter estimation via minimum logit chi-square. British Journal of Mathematical and Statistical Psychology, 40, 50–60.CrossRef Google Scholar

Baker, F.B. (1991). Comparison of minimum logit chi-square and Bayesian item parameter estimation. British Journal of Mathematical and Statistical Psychology, 44, 299–313.CrossRef Google Scholar

Baker, F.B. (1998). An investigation of the item parameter recovery characteristics of a Gibbs sampling procedure. Applied Psychological Measurement, 22, 153–169.CrossRef Google Scholar

Bejar, I.I. (1983). Introduction to item response models and their assumptions. In Hambleton, R.K. (Ed.), Applications of item response theory (pp. 1–23). Vancouver, BC: Educational Research Institute of British Columbia.Google Scholar

Blinkhorn, S.F. (1997). Past imperfect, future conditional: Fifty years of test theory. British Journal of Mathematical and Statistical Psychology, 50, 175–185.CrossRef Google Scholar

Bliss, L.B. (1980). A test of Lord's assumption regarding examinee guessing behavior on multiple-choice tests using elementary school students. Journal of Educational Measurement, 17, 147–153.CrossRef Google Scholar

Cressie, N., & Read, T.R.C. (1984). Multinomial goodness-of-fit tests. Journal of the Royal Statistical Society, Series B, 46, 440–464.Google Scholar

Cross, L.H., & Frary, R.B. (1977). An empirical test of Lord's theoretical results regarding formula scoring of multiple choice tests. Journal of Educational Measurement, 14, 313–321.CrossRef Google Scholar

De Ayala, R.J. (1992). The influence of dimensionality on CAT ability estimation. Educational and Psychological Measurement, 52, 513–528.CrossRef Google Scholar

Dinero, T.E., & Haertel, E. (1977). Applicability of the Rasch model with varying item discriminations. Applied Psychological Measurement, 1, 581–592.CrossRef Google Scholar

Drasgow, F., & Parsons, C.K. (1983). Application of unidimensional item response theory models to multidimensional data. Applied Psychological Measurement, 7, 189–199.CrossRef Google Scholar

Forsyth, R., Saisangjan, U., & Gilmer, J. (1981). Some empirical results related to the robustness of the Rasch model. Applied Psychological Measurement, 5, 175–186.CrossRef Google Scholar

Freedman, D.A. (1985). Statistics and the scientific method. In Mason, W.M. & Fienberg, S.E. (Eds.), Cohort analysis in social research: Beyond the identification problem (pp. 343–366). New York: Springer-Verlag.CrossRef Google Scholar

García-Pérez, M.A. (1985). A finite state theory of performance in multiple-choice tests. In Terouanne, E. (Ed.), Proceedings of the 16th European mathematical psychology group meeting (pp. 55–67). Montpellier: European Mathematical Psychology Group.Google Scholar

García-Pérez, M.A. (1987). A finite state theory of performance in multiple-choice tests. In Roskam, E.E. & Suck, R. (Eds.), Progress in mathematical psychology-I (pp. 455–464). Amsterdam: Elsevier.Google Scholar

García-Pérez, M.A. (1989a). Item sampling, guessing, partial information and decision-making in achievement testing. In Roskam, E.E. (Ed.), Mathematical psychology in progress (pp. 249–265). Berlin: Springer-Verlag.CrossRef Google Scholar

García-Pérez, M.A. (1989b). La corrección del azar en pruebas objetivas: un enfoque basado en una nueva teoría de estados finitos. Investigaciones Psicológicas, 6, 33–62.Google Scholar

García-Pérez, M.A. (1990). A comparison of two models of performance in objective tests: Finite states versus continuous distributions. British Journal of Mathematical and Statistical Psychology, 43, 73–91.CrossRef Google Scholar

García-Pérez, M.A. (1993). In defence of ‘none of the above.’ British Journal of Mathematical and Statistical Psychology, 46, 213–229.CrossRef Google Scholar

García-Pérez, M.A. (1994). Parameter estimation and goodness-of-fit testing in multinomial models. British Journal of Mathematical and Statistical Psychology, 47, 247–282.CrossRef Google Scholar

García-Pérez, M.A., & Frary, R.B. (1989). Psychometric properties of finite-state scores versus number-correct and formula scores: A simulation study. Applied Psychological Measurement, 13, 403–417.CrossRef Google Scholar

García-Pérez, M.A., & Frary, R.B. (1991a). Finite state polynomic item characteristic curves. British Journal of Mathematical and Statistical Psychology, 44, 45–73.CrossRef Google Scholar

García-Pérez, M.A., & Frary, R.B. (1991b). Testing finite state models of performance in objective tests using items with ‘none of the above’ as an option. In Doignon, J.-P. & Falmagne, J.-C. (Eds.), Mathematical psychology: Current developments (pp. 273–291). New York: Springer-Verlag.CrossRef Google Scholar

Gifford, J.A., & Swaminathan, H. (1990). Bias and the effect of priors in Bayesian estimation of parameters of item response models. Applied Psychological Measurement, 14, 33–43.CrossRef Google Scholar

Goldstein, H. (1979). Consequences of using the Rasch model for educational assessment. British Educational Research Journal, 5, 211–220.CrossRef Google Scholar

Goldstein, H., & Wood, R. (1989). Five decades of item response modelling. British Journal of Mathematical and Statistical Psychology, 42, 139–167.CrossRef Google Scholar

Gulliksen, H. (1961). Measurement of learning and mental abilities. Psychometrika, 26, 93–107.CrossRef Google Scholar PubMed

Hambleton, R.K. (1983). Application of item response models to criterion-referenced assessment. Applied Psychological Measurement, 7, 33–44.CrossRef Google Scholar

Hambleton, R.K., & Cook, L.L. (1983). Robustness of item response models and effects of test length and sample size on the precision of ability estimates. In Weiss, D.J. (Ed.), New horizons in testing: Latent trait test theory and computerized adaptive testing (pp. 31–49). New York: Academic Press.Google Scholar

Hambleton, R.K., & Murray, L.N. (1983). Some goodness of fit investigations for item response models. In Hambleton, R.K. (Ed.), Applications of item response theory (pp. 71–94). Vancouver, BC: Educational Research Institute of British Columbia.Google Scholar

Hambleton, R.K., & Swaminathan, H. (1985). Item response theory: Principles and applications. Boston, MA: Kluwer.CrossRef Google Scholar

Harrison, D.A. (1986). Robustness of IRT parameter estimation to violations of the unidimensionality assumption. Journal of Educational Statistics, 11, 91–115.CrossRef Google Scholar

Harwell, M.R., & Janosky, J.E. (1991). An empirical study of the effects of small datasets and varying prior variances on item parameter estimation in BILOG. Applied Psychological Measurement, 15, 279–291.CrossRef Google Scholar

Hulin, C.L., Lissak, R.I., & Drasgow, F. (1982). Recovery of two-and three-parameter logistic item characteristic curves: A Monte Carlo study. Applied Psychological Measurement, 6, 249–260.CrossRef Google Scholar

Jannarone, R.J., Yu, K.F., & Laughlin, J.E. (1990). Easy Bayes estimation for Rasch-type models. Psychometrika, 55, 449–460.CrossRef Google Scholar

Kim, J.K., & Nicewander, W.A. (1993). Ability estimation for conventional tests. Psychometrika, 58, 587–599.CrossRef Google Scholar

Kim, S.-H., Cohen, A.S., Baker, F.B., Subkoviak, M.J., & Leonard, T. (1994). An investigation of hierarchical Bayes procedures in item response theory. Psychometrika, 59, 405–421.CrossRef Google Scholar

Lord, F.M. (1974). Estimation of latent ability and item parameters when there are omitted responses. Psychometrika, 39, 247–264.CrossRef Google Scholar

Lord, F.M. (1975). The ‘ability’ scale in item characteristic curve theory. Psychometrika, 40, 205–217.CrossRef Google Scholar

Lord, F.M. (1980). Applications of item response theory to practical testing problems. Hillsdale, NJ: Erlbaum.Google Scholar

Lord, F.M. (1983). Maximum likelihood estimation of item response parameters when some responses are omitted. Psychometrika, 48, 477–482.CrossRef Google Scholar

Lord, F.M. (1986). Maximum likelihood and Bayesian parameter estimation in item response theory. Journal of Educational Measurement, 23, 157–162.CrossRef Google Scholar

Lord, F.M., & Novick, M.R. (1968). Statistical theories of mental test scores. Reading, MA: Addison-Wesley.Google Scholar

Marascuillo, L.A. (1988). Introduction to model building and rank tests. Contemporary Psychology, 33, 794–795.CrossRef Google Scholar

McKinley, R.L., & Mills, C.N. (1985). A comparison of several goodness-of-fit statistics. Applied Psychological Measurement, 9, 49–57.CrossRef Google Scholar

Mislevy, R.J. (1987). Bayes modal estimation in item response models. Psychometrika, 51, 177–195.CrossRef Google Scholar

Mislevy, R.J., & Bock, R.D. (1982). Biweight estimates of latent ability. Educational and Psychological Measurement, 42, 725–737.CrossRef Google Scholar

Mislevy, R.J., & Bock, R.D. (1984). BILOG Version 2.2: Item analysis and test scoring with binary logistic models. Mooresville, IN: Scientific Software.Google Scholar

Mislevy, R.J., & Bock, R.D. (1986). PC-BILOG: Item analysis and test scoring with binary logistic models. 1986 edition. Mooresville, IN: Scientific Software.Google Scholar

Mislevy, R.J., & Stocking, M.L. (1989). A consumer's guide to LOGIST and BILOG. Applied Psychological Measurement, 13, 57–75.CrossRef Google Scholar

Mislevy, R.J., & Verhelst, N. (1987). Modeling item responses when different subjects employ different solution strategies. Research Report RR-87-47-ONR. Princeton, NJ: Educational Testing Service.Google Scholar

Ramsay, J.O., & Abrahamowicz, M. (1989). Binomial regression with monotone splines: A psychometric application. Journal of the American Statistical Association, 84, 906–915.CrossRef Google Scholar

Reckase, M.D. (1979). Unifactor latent trait models applied to multifactor tests: Results and implications. Journal of Educational Statistics, 4, 207–230.CrossRef Google Scholar

Ree, M.J. (1979). Estimating item characteristic curves. Applied Psychological Measurement, 3, 371–385.CrossRef Google Scholar

Rosenbaum, P.R. (1984). Testing the conditional independence and monotonicity assumptions of item response theory. Psychometrika, 49, 425–435.CrossRef Google Scholar

Rowley, G.L., & Traub, R.E. (1977). Formula scoring, number-right scoring, and test-taking strategy. Journal of Educational Measurement, 14, 15–22.CrossRef Google Scholar

Seong, T.-J. (1990). Sensitivity of marginal maximum likelihood estimation of item and ability parameters to the characteristics of the prior ability distributions. Applied Psychological Measurement, 14, 299–311.CrossRef Google Scholar

Skaggs, G., & Stevenson, J. (1989). A comparison of pseudo-bayesian and joint maximum likelihood procedures for estimating item parameters in the three-parameter IRT model. Applied Psychological Measurement, 13, 391–402.CrossRef Google Scholar

Slakter, M. (1968). The penalty for not guessing. Journal of Educational Measurement, 5, 141–144.CrossRef Google Scholar

Swaminathan, H., & Gifford, J.A. (1983). Estimation of parameters in the three-parameter latent trait model. In Weiss, D.J. (Ed.). New horizons in testing: Latent trait test theory and computerized adaptive testing (pp. 13–30). New York: Academic Press.Google Scholar

Swaminathan, H., & Gifford, J.A. (1986). Bayesian estimation in the three-parameter logistic model. Psychometrika, 51, 589–601.CrossRef Google Scholar

Thissen, D., & Steinberg, L. (1984). A response model for multiple choice items. Psychometrika, 49, 501–519.CrossRef Google Scholar

Traub, R.E. (1983). A priori considerations in choosing an item response model. In Hambleton, R.K. (Ed.), Applications of item response theory (pp. 57–70). Vancouver, BC: Educational Research Institute of British Columbia.Google Scholar

Tsutakawa, R.K. (1992). Prior distribution for item response curves. British Journal of Mathematical and Statistical Psychology, 45, 51–74.CrossRef Google Scholar

Tsutakawa, R.K., & Johnson, J.C. (1990). The effect of uncertainty of item parameter estimation on ability estimates. Psychometrika, 55, 371–390.CrossRef Google Scholar

Tsutakawa, R.K., & Lin, H.Y. (1986). Bayesian estimation of item response curves. Psychometrika, 51, 251–267.CrossRef Google Scholar

Tsutakawa, R.K., & Soltys, M.J. (1988). Approximation for Bayesian ability estimation. Journal of Educational Statistics, 13, 117–130.CrossRef Google Scholar

Vale, C.D., & Gialluca, K.A. (1988). Evaluation of the efficiency of item calibration. Applied Psychological Measurement, 12, 53–67.CrossRef Google Scholar

Wainer, H., & Thissen, D. (1987). Estimating ability with the wrong model. Journal of Educational Statistics, 12, 339–368.CrossRef Google Scholar

Wainer, H., & Wright, B.D. (1980). Robust estimation of ability in the Rasch model. Psychometrika, 45, 373–391.CrossRef Google Scholar

Waller, M.I. (1989). Modeling guessing behavior: A comparison of two IRT models. Applied Psychological Measurement, 13, 233–243.CrossRef Google Scholar

Wang, T., & Vispoel, W.P. (1998). Properties of ability estimation methods in computerized adaptive testing. Journal of Educational Measurement, 35, 109–135.CrossRef Google Scholar

Warm, A.W. (1989). Weighted likelihood estimation of ability in item response theory with tests of finite length. Psychometrika, 54, 427–450.CrossRef Google Scholar

Weiss, D.J., & Yoes, M.E. (1991). Item response theory. In Hambleton, R.K. & Zaal, J.N. (Eds.), Advances in educational and psychological testing: Theory and applications (pp. 69–95). Boston, MA: Kluwer.CrossRef Google Scholar

Weitzman, R.A. (1996). The Rasch model plus guessing. Educational and Psychological Measurement, 56, 779–790.CrossRef Google Scholar

Wichmann, B.A., & Hill, I.D. (1982). Algorithm AS 183. An efficient and portable pseudo-random number generator. Applied Statistics, 31, 188–190.CrossRef Google Scholar

Wingersky, M.S., Barton, M.A., & Lord, F.M. (1982). LOGIST 5.0 version 1.0 users' guide. Princeton, NJ: Educational Testing Service.Google Scholar

Wood, R. (1978). Fitting the Rasch model—A heady tale. British Journal of Mathematical and Statistical Psychology, 31, 27–32.CrossRef Google Scholar

Yen, W.M. (1981). Using simulation results to choose a latent trait model. Applied Psychological Measurement, 5, 245–262.CrossRef Google Scholar

Yen, W.M. (1984). Effects of local item dependence on the fit and equating performance of the three-parameter logistic model. Applied Psychological Measurement, 8, 125–145.CrossRef Google Scholar

Yen, W.M. (1987). A comparison of the efficiency and accuracy of BILOG and LOGIST. Psychometrika, 52, 275–291.CrossRef Google Scholar

Yen, W.M., Burket, G.R., & Sykes, R.C. (1991). Nonunique solutions to the likelihood equation for the three-parameter logistic model. Psychometrika, 56, 39–54.CrossRef Google Scholar

Zeng, L. (1997). Implementation of marginal Bayesian estimation with four-parameter beta prior distributions. Applied Psychological Measurement, 21, 143–156.CrossRef Google Scholar

Zin, T.T. (1992). Comparing 12 finite state models of examinee performance on multiple-choice tests. Ph.D. Dissertation. Virginia Polytechnic Institute and State University.Google Scholar

Article contents

Fitting Logistic IRT Models: Small Wonder

Abstract

Keywords

Access options

References

Save article to Kindle

Save article to Dropbox

Save article to Google Drive

Reply to: Submit a response

Your details

You have entered the maximum number of contributors

Conflicting interests