IRT Models for Expert-Coded Panel Data

Kyle L. Marquardt; Daniel Pemstein

doi:10.1017/pan.2018.28

IRT Models for Expert-Coded Panel Data

Published online by Cambridge University Press: 03 September 2018

Kyle L. Marquardt and

Daniel Pemstein

Show author details

Kyle L. Marquardt*: Affiliation:
V-Dem Institute, Department of Political Science, University of Gothenburg, Gothenburg, Sweden. Email: kyle.marquardt@gu.se
Daniel Pemstein: Affiliation:
Department of Criminial Justice and Political Science, North Dakota State University, Fargo, ND 58105, USA. Email: daniel.pemstein@ndsu.edu
*: *Email: kyle.marquardt@gu.se

Article contents

Abstract
Footnotes
References

Get access

Rights & Permissions

Abstract

Data sets quantifying phenomena of social-scientific interest often use multiple experts to code latent concepts. While it remains standard practice to report the average score across experts, experts likely vary in both their expertise and their interpretation of question scales. As a result, the mean may be an inaccurate statistic. Item-response theory (IRT) models provide an intuitive method for taking these forms of expert disagreement into account when aggregating ordinal ratings produced by experts, but they have rarely been applied to cross-national expert-coded panel data. We investigate the utility of IRT models for aggregating expert-coded data by comparing the performance of various IRT models to the standard practice of reporting average expert codes, using both data from the V-Dem data set and ecologically motivated simulated data. We find that IRT approaches outperform simple averages when experts vary in reliability and exhibit differential item functioning (DIF). IRT models are also generally robust even in the absence of simulated DIF or varying expert reliability. Our findings suggest that producers of cross-national data sets should adopt IRT techniques to aggregate expert-coded data measuring latent concepts.

Keywords

Bayesian methods expert opinion latent variables IRT models cross-national data

Type: Articles
Information: Political Analysis , Volume 26 , Issue 4 , October 2018 , pp. 431 - 456

DOI: https://doi.org/10.1017/pan.2018.28 [Opens in a new window]
Copyright: Copyright © The Author(s) 2018. Published by Cambridge University Press on behalf of the Society for Political Methodology.

Access options

Get access to the full version of this content by using one of the access options below. (Log in options will check for institutional or personal access. Content may require purchase if you do not have access.)

Footnotes

Authors’ note: Earlier drafts presented at the 2016 MPSA Annual Convention, the 2016 IPSA World Convention and the 2016 V-Dem Latent Variable Modeling Week Conference. We thank Chris Fariss, Juraj Medzihorsky, Pippa Norris, Jon Polk, Shawn Treier, Carolien van Ham and Laron Williams for their comments on earlier drafts of this paper, as well as V-Dem Project members for their suggestions and assistance. We are also grateful to the editor and two anonymous reviewers for their detailed suggestions. This material is based upon work supported by the National Science Foundation under Grant No. SES-1423944 (PI: Daniel Pemstein); the Riksbankens Jubileumsfond, Grant M13-0559:1 (PI: Staffan I. Lindberg); the Swedish Research Council, 2013.0166 (PI: Staffan I. Lindberg and Jan Teorell); the Knut and Alice Wallenberg Foundation (PI: Staffan I. Lindberg); the University of Gothenburg, Grant E 2013/43; and internal grants from the Vice-Chancellor’s office, the Dean of the College of Social Sciences, and the Department of Political Science at University of Gothenburg. We performed simulations and other computational tasks using resources provided by the Notre Dame Center for Research Computing (CRC) through the High Performance Computing section and the Swedish National Infrastructure for Computing (SNIC) at the National Supercomputer Centre in Sweden (SNIC 2016/1- 382, 2017/1-407 and 2017/1-68). We specifically acknowledge the assistance of In-Saeng Suh at CRC and Johan Raber at SNIC in facilitating our use of their respective systems. Replication materials available in Marquardt and Pemstein (2018).

Contributing Editor: R. Michael Alvarez

References

Aldrich, John H., and McKelvey, Richard D.. 1977. A method of scaling with applications to the 1968 and 1972 Presidential elections. American Political Science Review 71(1):111–130.Google Scholar

Bakker, R., de Vries, C., Edwards, E., Hooghe, L., Jolly, S., Marks, G., Polk, J., Rovny, J., Steenbergen, M., and Vachudova, M. A.. 2012. Measuring party positions in Europe: The Chapel Hill expert survey trend file, 1999–2010. Party Politics 21(1):143–152.Google Scholar

Bakker, Ryan, Jolly, Seth, Polk, Jonathan, and Poole, Keith. 2014. The European common space: Extending the use of anchoring vignettes. The Journal of Politics 76(4):1089–1101.Google Scholar

Boyer, K. K., and Verma, R.. 2000. Multiple raters in survey-based operations management research: A review and tutorial. Production and Operations Management 9(2):128–140.Google Scholar

Brady, Henry E. 1985. The perils of survey research: Inter-personally incomparable responses. Political Methodology 11(3/4):269–291.Google Scholar

Buttice, Matthew K., and Stone, Walter J.. 2012. Candidates matter: Policy and quality differences in congressional elections. Journal of Politics 74(3):870–887.Google Scholar

Clinton, Joshua D., and Lewis, David E.. 2008. Expert opinion, agency characteristics, and agency preferences. Political Analysis 16(1):3–20.Google Scholar

Coppedge, Michael, Gerring, John, Lindberg, Staffan I., Teorell, Jan, Pemstein, Daniel, Tzelgov, Eitan, Wang, Yi-ting, Glynn, Adam, Altman, David, Bernhard, Michael, Steven Fish, M., Hicken, Allen, McMann, Kelly, Paxton, Pamela, Reif, Megan, Skaaning, Svend-Erik, and Staton, Jeffrey. 2014. V-Dem: A new way to measure democracy. Journal of Democracy 25(3):159–169.Google Scholar

Coppedge, Michael, Gerring, John, Lindberg, Staffan I., Skaaning, Svend-Erik, Teorell, Jan, Altman, David, Bernhard, Michael, Steven Fish, M., Glynn, Adam, Hicken, Allen, Knutsen, Carl Henrik, McMann, Kelly, Paxton, Pamela, Pemstein, Daniel, Staton, Jeffrey, Zimmerman, Britte, Andersson, Frida, Mechkova, Valeriya, and Miri, Farhad. 2016. Varieties of democracy codebook v6. Technical report. Varieties of Democracy Project: Project Documentation Paper Series.Google Scholar

Coppedge, Michael, Gerring, John, Lindberg, Staffan I., Skaaning, Svend-Erik, Teorell, Jan, Altman, David, Bernhard, Michael, Steven Fish, M., Glynn, Adam, Hicken, Allen, Knutsen, Carl Henrik, Marquardt, Kyle L., McMann, Kelly, Miri, Farhad, Paxton, Pamela, Pemstein, Daniel, Staton, Jeffrey, Tzelgov, Eitan, Wang, Yi-ting, and Zimmerman, Brigitte. 2016. V–Dem Dataset v6.2. Technical report. Varieties of Democracy Project. https://ssrn.com/abstract=2968289.Google Scholar

Coppedge, Michael, Gerring, John, Lindberg, Staffan I., Skaaning, Svend-Erik, Teorell, Jan, Andersson, Frida, Marquardt, Kyle L., Mechkova, Valeriya, Miri, Farhad, Pemstein, Daniel, Pernes, Josefine, Stepanova, Natalia, Tzelgov, Eitan, and Wang, Yi-Ting. 2016. Varieties of Democracy Methodology v5. Technical report. Varieties of Democracy Project: Project Documentation Paper Series.Google Scholar

Hare, Christopher, Armstrong, David A., Bakker, Ryan, Carroll, Royce, and Poole, Keith T. 2015. Using Bayesian Aldrich-McKelvey Scaling to study citizens’ ideological preferences and perceptions. American Journal of Political Science 59(3):759–774.Google Scholar

Johnson, Valen E., and Albert, James H.. 1999. Ordinal Data Modeling . New York: Springer.Google Scholar

Jones, Bradford S., and Norrander, Barbara. 1996. The reliability of aggregated public opinion measures. American Journal of Political Science 40(1):295–309.Google Scholar

King, Gary, Murray, Christopher J. L., Salomon, Joshua A., and Tandon, Ajay. 2004. Enhancing the validity and cross-cultural comparability of measurement in survey research. The American Political Science Review 98(1):191–207.Google Scholar

King, Gary, and Wand, Jonathan. 2007. Comparing incomparable survey responses: Evaluating and selecting anchoring vignettes. Political Analysis 15(1):46–66.Google Scholar

Konig, T., Marbach, M., and Osnabrugge, M.. 2013. Estimating party positions across countries and time–a dynamic latent variable model for manifesto data. Political Analysis 21(4):468–491.Google Scholar

Kozlowski, Steve W., and Hattrup, Keith. 1992. A disagreement about within-group agreement: Disentangling issues of consistency versus consensus. Journal of Applied Psychology 77(2):161–167.Google Scholar

Lebreton, J. M., and Senter, J. L.. 2007. Answers to 20 questions about interrater reliability and interrater agreement. Organizational Research Methods 11(4):815–852.Google Scholar

Lindstädt, Rene, Proksch, Sven-Oliver, and Slapin, Jonathan B.. 2016. When experts disagree: Response aggregation and its consequences in expert surveys.Google Scholar

Maestas, Cherie D., Buttice, Matthew K., and Stone, Walter J.. 2014. Extracting wisdom from experts and small crowds: Strategies for improving informant-based measures of political concepts. Political Analysis 22(3):354–373.Google Scholar

Marquardt, Kyle, and Pemstein, Daniel. 2018. Replication Data for: IRT models for expert-coded panel data, https://doi.org/10.7910/DVN/KGP01E, Harvard Dataverse, V1.Google Scholar

Norris, Pippa, Frank, Richard W., and Martínez I Coma, Ferran. 2013. Assessing the quality of elections. Journal of Democracy 24(4):124–135.Google Scholar

Pemstein, Daniel, Seim, Brigitte, and Lindberg, Staffan I.. 2016. Anchoring vignettes and item response theory in cross-national expert surveys.Google Scholar

Pemstein, Daniel, Tzelgov, Eitan, and Wang, Yi-ting. 2015. Evaluating and improving item response theory models for cross-national expert surveys. Varieties of Democracy Institute Working Paper 1(March):1–53.Google Scholar

Pemstein, Daniel, Marquardt, Kyle L., Tzelgov, Eitan, Wang, Yi-ting, and Miri, Farhad. 2015. The V-Dem measurement model: Latent variable analysis for cross-national and cross-temporal expert-coded data. Varieties of Democracy Institute Working Paper , 21.Google Scholar

Ramey, Adam. 2016. Vox populi, vox dei? Crowdsourced ideal point estimation. The Journal of Politics 78(1):281–295.Google Scholar

Stan Development Team. 2015. Stan: A C++ Library for Probability and Sampling, Version 2.9.0. http://mc-stan.org/.Google Scholar

Teorell, Jan, Dahlström, Carl, and Dahlberg, Stefan. 2011. The QoG expert survey dataset. Technical report. University of Gothenburg: The Quality of Government Institute, http://www.qog.pol.gu.se.Google Scholar

Treier, Shawn, and Jackman, Simon. 2008. Democracy as a latent variable. American Journal of Political Science 52(1):201–217.Google Scholar

Van Bruggen, Gerrit H., Lilien, Gary L., and Kacker, Manish. 2002. Informants in organizational marketing research: Why use multiple informants and how to aggregate responses. Journal of Marketing Research 39(4):469–478.Google Scholar

von Davier, Matthias, Shin, Hyo-Jeong, Khorramdel, Lale, and Stankov, Lazar. 2017. The effects of vignette scoring on reliability and validity of self-reports. Applied Psychological Measurement 42(4):291–306.Google Scholar

Marquardt and Pemstein supplementary material

Online Appendix

File 695.8 KB

Article contents

IRT Models for Expert-Coded Panel Data

Abstract

Keywords

Access options

Footnotes

References

Marquardt and Pemstein supplementary material

Save article to Kindle

Save article to Dropbox

Save article to Google Drive

Reply to: Submit a response

Your details

You have entered the maximum number of contributors

Conflicting interests