Hostname: page-component-7c8c6479df-7qhmt Total loading time: 0 Render date: 2024-03-29T11:49:52.124Z Has data issue: false hasContentIssue false

Evaluating Online Labor Markets for Experimental Research: Amazon.com's Mechanical Turk

Published online by Cambridge University Press:  04 January 2017

Adam J. Berinsky*
Affiliation:
Department of Political Science, Massachusetts Institute of Technology, Cambridge, MA 02139
Gregory A. Huber
Affiliation:
Institution for Social and Policy Studies, Yale University, New Haven, CT 06511. e-mail: gregory.huber@yale.edu
Gabriel S. Lenz
Affiliation:
Department of Political Science, University of California, Berkeley, Berkeley, CA 94720. e-mail: glenz@berkeley.edu
*
e-mail: berinsky@mit.edu (corresponding author)
Rights & Permissions [Opens in a new window]

Abstract

Core share and HTML view are not available for this content. However, as you have access to this content, a full PDF is available via the ‘Save PDF’ action button.

We examine the trade-offs associated with using Amazon.com's Mechanical Turk (MTurk) interface for subject recruitment. We first describe MTurk and its promise as a vehicle for performing low-cost and easy-to-field experiments. We then assess the internal and external validity of experiments performed using MTurk, employing a framework that can be used to evaluate other subject pools. We first investigate the characteristics of samples drawn from the MTurk population. We show that respondents recruited in this manner are often more representative of the U.S. population than in-person convenience samples—the modal sample in published experimental political science—but less representative than subjects in Internet-based panels or national probability samples. Finally, we replicate important published experimental work using MTurk samples.

Type
Research Article
Copyright
Copyright © The Author 2012. Published by Oxford University Press on behalf of the Society for Political Methodology 

Footnotes

Authors' note: Supplementary data for this article are available on the Political Analysis Web site.

References

Ballew, Charles C. II, and Todorov, Alexander. 2007. Predicting political elections from rapid and unreflective face judgments. Proceedings of the National Academy of Sciences of the United States of America 104: 17948–53.Google ScholarPubMed
Berinsky, Adam J., Huber, Gregory A., and Lenz, Gabriel S. 2011. Replication data for: Evaluating online labor markets for experimental research: Mechanical Turk. IQSS Dataverse Network [Distributor] V1 [Version]. http://hdl.handle.net/1902.1/17220 (accessed January 19, 2012).Google Scholar
Berinsky, Adam J., and Kinder, Donald R. 2006. Making sense of issues through media frames: Understanding the Kosovo crisis. Journal of Politics 68: 640–56.CrossRefGoogle Scholar
Bless, Herbert, Betsch, Tilmann, and Franzen, Axel. 1998. Framing the framing effect: The impact of context cues on solutions to the ‘Asian disease’ problem. European Journal of Social Psychology 28: 287–91.3.0.CO;2-U>CrossRefGoogle Scholar
Buhrmester, Michael D., Kwang, Tracy, and Gosling, Samuel D. 2011. Amazon's Mechanical Turk: A new source of inexpensive, yet high-quality, data? Perspectives on Psychological Science 6: 35.CrossRefGoogle Scholar
Chandler, Dana, and Kapelner, Adam. 2010. Breaking monotony with meaning: Motivation in crowdsourcing markets. University of Chicago Mimeo.Google Scholar
Chen, Daniel L., and Horton, John J. 2010. The wages of pay cuts: Evidence from a field experiment. Harvard University Mimeo.CrossRefGoogle Scholar
Chong, Dennis, and Druckman, James N. 2010. Dynamic public opinion: Communication effects over time. American Political Science Review 104: 663–80.CrossRefGoogle Scholar
Downs, Julie S., Holbrook, Mandy B., Sheng, Steve, and Cranor, Lorrie Faith. 2010. Are your participants gaming the system? Screening Mechanical Turk workers. Proceedings of the 28th International Conference on Human Factors in Computing Systems, CHI 2010, 2399-402. New York: ACM Press.Google Scholar
Druckman, James N. 2001. Evaluating framing effects. Journal of Economic Psychology 22: 91101.CrossRefGoogle Scholar
Druckman, James N., and Kam, Cindy D. 2011. Students as experimental participants: A defense of the ‘narrow data base’. In Handbook of experimental political science, eds. Druckman, James N., Green, Donald P., Kuklinski, James H., Lupia, Arthur, 4157. New York: Cambridge University Press.CrossRefGoogle Scholar
Gerber, Alan S., Gimpel, James G., Green, Donald P., and Shaw, Daron R. 2011. How large and long-lasting are the persuasive effects of televised campaign ads? Results from a randomized field experiment. American Political Science Review 105: 135–50.CrossRefGoogle Scholar
Green, Donald P., and Kern, Holger L. 2010. Modeling heterogeneous treatment effects in survey experiments with Bayesian additive regression trees. Yale University Mimeo.Google Scholar
Gosling, Samuel D., Vazire, Simine, Srivastava, Sanjay, and John, Oliver P. 2004. Should we trust web-based studies? American Psychologist 59: 93104 CrossRefGoogle ScholarPubMed
Horton, John J., Rand, David G., and Zeckhauser, Richard J. 2010. The online laboratory: Conducting experiments in a real labor market. Available at SSRN: http://ssrn.com/abstract=1591202 (accessed January 19, 2012).Google Scholar
Horton, J., and Chilton, L. 2010. The labor economics of paid crowdsourcing. Proceedings of the 11th ACM Conference on Electronic Commerce, Cambridge, MA.CrossRefGoogle Scholar
Jou, Jerwen, Shanteau, James, and Harris, Richard. 1996. An information processing view of framing effects: The role of causal schemas in decision making. Memory & Cognition 24(1): 115.CrossRefGoogle ScholarPubMed
Kam, Cindy D., and Simas, Elizabeth N. 2010. Risk orientations and policy frames. Journal of Politics 72: 381–96.CrossRefGoogle Scholar
Kam, Cindy D., Wilking, Jennifer R., and Zechmeister, Elizabeth J. 2007. Beyond the ‘narrow data base’: Another convenience sample for experimental research. Political Behavior 29: 415–40.CrossRefGoogle Scholar
Kittur, Aniket, Chi, Ed H., and Suh, Bongwon. 2008. Crowdsourcing user studies with Mechanical Turk. Proceedings of the 26th Annual CHI Conference on Human Factors in Computing Systems, CHI 2009, 453-6. New York: ACM Press.CrossRefGoogle Scholar
Kuhberger, Anton. 1995. The framing of decisions: A new look at old problems. Organizational Behavior & Human Decision Processes 62: 230–40.CrossRefGoogle Scholar
Lawson, C., Lenz, Gabriel S., Myers, Mike, and Baker, Andy. 2010. Looking like a winner: Candidate appearance and electoral success in new democracies. World Politics 62: 561–93.CrossRefGoogle Scholar
Mason, Winter, and Watts, Duncan J. 2009. Financial incentives and the performance of crowds. Proceedings of the ACM SIGKDD Workshop on Human Computation, 77-85. New York: ACM Press.CrossRefGoogle Scholar
Orne, M. T. 1962. On the social psychology of the psychological experiment: With particular reference to demand characteristics and their implications. American Psychologist 17: 776–83.CrossRefGoogle Scholar
Paolacci, Gabriele, Chandler, Jesse, and Ipeirotis, Panagiotis G. 2010. Running experiments on Amazon Mechanical Turk. Judgment and Decision Making 5: 411–19.CrossRefGoogle Scholar
Rasinski, Kenneth A. 1989. The effect of question wording on public support for government spending. Public Opinion Quarterly 53: 388–94.CrossRefGoogle Scholar
Ross, Joel, Irani, Lily, Six Silberman, M., Zaldivar, Andrew, and Tomlinson, Bill. 2010. Who are the crowdworkers? Shifting demographics in Amazon Mechanical Turk. In CHI EA 2010, 2863-72. New York: ACM Press.Google Scholar
Sears, David O. 1986. College sophomores in the laboratory: Influences of a narrow data base on social psychology's view of human nature. Journal of Personality and Social Psychology 51: 515–30.CrossRefGoogle Scholar
Sheng, Victor S., Provost, Foster, and Ipeirotis, Panagiotis G. 2008. Get another label? Improving data quality and data mining using multiple, noisy labelers. Proceeding of the 14th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, 614–22. New York: ACM Press.Google Scholar
Snow, Rion, O'Connor, Brendan, Jurafsky, Daniel, and Ng, Andrew Y. 2008. Cheap and fast—But is it good? Evaluating non-expert annotations for natural language tasks. EMNLP '08: Proceedings of the Conference on Empirical Methods in Natural Language Processing, 254–63. Morristown NJ: Association for Computational Linguistics.Google Scholar
Sorokin, Alexander, and Forsyth, David. 2008. Utility data annotation with Amazon Mechanical Turk. Computer Vision and Pattern Recognition Workshops ‘08 51: 18.Google Scholar
Takemura, Kazuhisa. 1994. Influence of elaboration on the framing of decision. Journal of Psychology 128: 33–9.Google Scholar
Transue, John E., Lee, Daniel J., and Aldrich, John H. 2009. Treatment spillover effects across survey experiments. Political Analysis 17: 143–61.CrossRefGoogle Scholar
Tversky, Amos, and Kahneman, Daniel. 1981. The framing of decisions and the psychology of choice. Science 211: 453–8.CrossRefGoogle ScholarPubMed
Supplementary material: File

Berinsky et al. supplementary material

Appendix

Download Berinsky et al. supplementary material(File)
File 890.9 KB