The micro-task market for lemons: data quality on Amazon's Mechanical Turk

Douglas J. Ahler; Carolyn E. Roush; Gaurav Sood

doi:10.1017/psrm.2021.57

The micro-task market for lemons: data quality on Amazon's Mechanical Turk

Published online by Cambridge University Press: 25 October 2021

and

Douglas J. Ahler*: Affiliation:
Florida State University, Tallahassee, FL, USA
Carolyn E. Roush: Affiliation:
Florida State University, Tallahassee, FL, USA
Gaurav Sood: Affiliation:
Independent Researcher
*: *Corresponding author. Email: dahler@fsu.edu

Article contents

Abstract
References

Get access

Rights & Permissions

Abstract

While Amazon's Mechanical Turk (MTurk) has reduced the cost of collecting original data, in 2018, researchers noted the potential existence of a large number of bad actors on the platform. To evaluate data quality on MTurk, we fielded three surveys between 2018 and 2020. While we find no evidence of a “bot epidemic,” significant portions of the data—between 25 and 35 percent—are of dubious quality. While the number of IP addresses that completed the survey multiple times or circumvented location requirements fell almost 50 percent over time, suspicious IP addresses are more prevalent on MTurk than on other platforms. Furthermore, many respondents appear to respond humorously or insincerely, and this behavior increased over 200 percent from 2018 to 2020. Importantly, these low-quality responses attenuate observed treatment effects by magnitudes ranging from approximately 10 to 30 percent.

Keywords

Experimental research public opinion

Type: Original Article
Information: Political Science Research and Methods , First View , pp. 1 - 20

DOI: https://doi.org/10.1017/psrm.2021.57 [Opens in a new window]
Copyright: Copyright © The Author(s), 2021. Published by Cambridge University Press on behalf of the European Political Science Association

Access options

Get access to the full version of this content by using one of the access options below. (Log in options will check for institutional or personal access. Content may require purchase if you do not have access.)

References

Ahler, DJ and Broockman, D (2018) The delegate paradox: why polarized politicians can represent citizens best. Journal of Politics 80, 1117–1133.CrossRef Google Scholar

Ahler, DJ and Goggin, SN (2019) How does one recognize #FakeNews? Assessing competing explanations using a conjoint experiment. In Annual Meeting of the Midwest Political Science Association. Chicago: Midwest Political Science Association.Google Scholar

Akerlof, GA (1970) The market for “lemons”: quality yncertainty and the market mechanism. Quarterly Journal of Economics 84, 488–500.CrossRef Google Scholar

Amazon Mechanical Turk (2019 a) MTurk worker quality and identity. Available at https://blog.mturk.com/mturk-worker-identity-and-task-quality-d3be46d83d0d.Google Scholar

Amazon Mechanical Turk (2019 b) Qualifications and worker task quality. Available at https://blog.mturk.com/qualifications-and-worker-task-quality-best-practices-886f1f4e03fc.Google Scholar

Aronow, PM, Kalla, J, Orr, L and Ternovski, J (2020) Evidence of rising rates of inattentiveness on Lucid in 2020. Preliminary memo: https://osf.io/preprints/socarxiv/8sbe4/.CrossRef Google Scholar

Bai, H (2018) Evidence that a large amount of low quality responses on MTurk can be detected with repeated GPS coordinates. Available at https://www.maxhuibai.com/blog/evidence-that-responses-from-repeating-gps-are-random.Google Scholar

Bartels, LM (2002) Beyond the running tally: partisan bias in political perceptions. Political Behavior 24, 117–150.CrossRef Google Scholar

Berinsky, AJ, Huber, GA and Lenz, GS (2012) Evaluating online labor markets for experimental research: Amazon.com's Mechanical Turk. Political Analysis 20, 351–368.CrossRef Google Scholar

Bisgaard, M (2015) Bias will find a way: economic perceptions, attributions of blame, and partisan motivated reasoning during crisis. The Journal of Politics 77, 849–860.CrossRef Google Scholar

Busby, EC (2020) Perceptions of extremism in the American public and elected officials. Unpublished manuscript.Google Scholar

Campbell, DT and Stanley, JC (1963) Experimental and Quasi-Experimental Designs for Research. Boston: Hought Mifflin Company.Google Scholar

Casler, K, Bickel, L and Hackett, E (2013) Separate but equal? A comparison of participants and data gathered via Amazon's MTurk, social media, and face-to-face behavioral testing. Computers in Human Behavior 29, 2156–2160.CrossRef Google Scholar

Chandler, J, Sisso, I and Shapiro, D (2020) Participant carelessness and fraud: consequences for clinical research and potential solutions. Journal of Abnormal Psychology 129, 49–55.CrossRef Google Scholar PubMed

Coppock, A and McClellan, OA (2019) Validating the demographic, political, psychological, and experimental results obtained from a new source of online survey respondents. Research & Politics 6, 1–14.CrossRef Google Scholar

Cor, MK and Sood, G (2016) Guessing and forgetting: a latent class model for measuring learning. Political Analysis 24, 226–242.Google Scholar

Cornell, D, Klein, J, Konold, T and Huang, F (2012) Effects of validity screening items on adolescent survey data. Psychological Assessment 24, 21–35.CrossRef Google Scholar PubMed

Dreyfuss, E (2018) A bot panic hits Amazon's Mechanical Turk. Wired 17 August. Available at https://www.wired.com/story/amazon-mechanical-turk-bot-panic/.Google Scholar

Garz, M, Sood, G, Stone, DF and Wallace, J (2018) What drives demand for media slant? Unpublished manuscript. Available at https://papers.ssrn.com/sol3/papers.cfm?abstract˙id=3009791.Google Scholar

Goodman, JK, Cryer, CE and Cheema, A (2012) Data collection in a flat world: the strengths and weaknesses of Mechanical Turk samples. Journal of Behavioral Decision Making 26, 213–224.CrossRef Google Scholar

Graham, M (2020) When good citizens are good partisans: attributing responsibility for the COVID-19 pandemic. Unpublished manuscript.Google Scholar

Graham, M (2021) “We Don’t Know” Means “They’re Not Sure.” Forthcoming at Public Opinion Quarterly.CrossRef Google Scholar

Hauser, DJ and Schwarz, N (2016) Attentive turkers: MTurk participants perform better on online attention checks than do subject pool participants. Behavior Research Methods 48, 400–407.CrossRef Google Scholar PubMed

Hillygus, DS, Jackson, N and Young, M (2014) Professional respondents in non-probability online panels. In Callegaro M, Baker R, Bethlehem J, Göritz AS, Krosnick JA and Lavrakas PJ (eds), Online Panel Research: A Data Quality Perspective, New York: John Wiley & Sons, pp. 219–237.Google Scholar

Horton, JJ, Rand, DG and Zeckhauser, RJ (2011) The online laboratory: conducting experiments in a real labor maret. Experimental Economics 14, 399–425.CrossRef Google Scholar

Institute of Governmental Studies at the University of California, Berkeley (2015) Omnibus Survey. https://www.igs.berkeley.edu/igs-poll/berkeley-igs-poll.Google Scholar

Kennedy, R, Clifford, S, Burleigh, T, Waggoner, P and Jewell, R (2018) How Venezuela's economic crisis is undermining social science research—about everything. Monkey Cage Blog 7 November. Available at https://www.washingtonpost.com/news/monkey-cage/wp/2018/11/07/how-the-venezuelan-economic-crisis-is-undermining-social-science-research-about-everything-not-just-venezuela/?utm˙term=.8945c0926825.Google Scholar

Kennedy, R, Clifford, S, Burleigh, T, Waggoner, PD, Jewell, R and Winter, N (2020) The shape of and solutions to the MTurk quality crisis. Political Science Research & Methods 8, 614–629.CrossRef Google Scholar

Krosnick, J (1991) Response strategies for coping with the cognitive demands of attitude meaures in surveys. Applied Cognitive Psychology 5, 213–236.CrossRef Google Scholar

Krosnick, JA, Narayan, S and Smith, WR (1996) Satisficing in surveys: initial evidence. New Directions for Evaluation 70, 29–44.CrossRef Google Scholar

Laohaprapanon, S and Sood, G (2018) Know Your IP. Available at https://github.com/themains/know_your_ip.Google Scholar

Litman, L (2019) Best recruitment practices: working with issues of non-naivete on MTurk. Available at https://www.cloudresearch.com/resources/blog/best-recruitment-practices-working-with-issues-of-non-naivete-on-mturk/.Google Scholar

Lopez, J and Hillygus, DS (2018) Why so serious? Survey trolls and misinformation. In Annual Meeting of the Midwest Political Science Association. Chicago. Unpublished manuscript.CrossRef Google Scholar

MaxMind, LLC (2006) GeoIP. Available at https://www.maxmind.com/en/home.Google Scholar

Mitchell, RE (2005) How many deaf people are there in the United States? Estimates from the survey of income and program participation. Journal of Deaf Studies and Deaf Education 11, 112–119.CrossRef Google Scholar PubMed

Mullinix, KJ, Leeper, TJ, Druckman, JN and Freese, J (2015) The generalizability of survey experiments. Journal of Experimental Political Science 2, 109–138.CrossRef Google Scholar

Mummolo, J and Peterson, E (2019) Demand effects in survey experiments: an empirical assessment. American Political Science Review 113, 517–529.CrossRef Google Scholar

National Gang Intelligence Center (U.S.) (2012) 2011 National Gang Threat Assessment: Emerging Trends. New York, NY: National Gang Intelligence Center.Google Scholar

Paolacci, G, Chandler, J and Ipeirotis, PG (2010) Running experiments on Amazon Mechanical Turk. Judgment and Decision Making 5, 411–419.Google Scholar

Paolacco, G and Chandler, J (2014) Inside the Turk: understanding Mechanical Turk as a participant pool. Current Directions in Psychological Science 23, 184–188.CrossRef Google Scholar

Peer, E, Vosgerau, J and Acquisti, A (2014) Reputation as a sufficient condition for data quality on Amazon Mechanical Turk. Behavior Research Methods 46, 1023–1031.CrossRef Google Scholar PubMed

Pontin, J (2007) Artificial intelligence, with help from the humans. The New York Times 25 March. Available at https://www.nytimes.com/2007/03/25/business/yourmoney/25Stream.html.Google Scholar

Robinson-Cimpian, JP (2014) Inaccurate estimation of disparities due to mischevious responders: several Suggestions to assess conclusions. Educational Researcher 43, 171–185.CrossRef Google Scholar

Roush, CE and Sood, G (2020) A gap in our understanding? Reconsidering the evidence for partisan knowledge gaps. Unpublished manuscript. Available at https://www.gsood.com/research/papers/partisan˙gap.pdf.Google Scholar

Ryan, TJ (2018) Data contamination on MTurk. Available at http://timryan.web.unc.edu/2018/08/12/data-contamination-on-mturk/.Google Scholar

Savin-Williams, RC and Joyner, K (2014) The dubious assessment of gay, lesbian, and bisexual adolescents of add health. Archives of Sexual Behavior 43, 413–422.CrossRef Google Scholar PubMed

Sears, DO (1986) College sophomores in the laboratory: influences of a narrow data base on social psychology's view of human nature. Journal of Personality and Social Psychology 51, 515–530.CrossRef Google Scholar

Shet, V (2014) Are you a robot? Introducing ‘No CAPTCHA reCAPTCHA”. Available at https://security.googleblog.com/2014/12/are-you-robot-introducing-no-captcha.html.Google Scholar

Thomas, KA and Clifford, S (2015) The generalizability of survey experiments. Computers in Human Behavior 77, 184–197.CrossRef Google Scholar

Thomas, KA and Clifford, S (2017) Validity and Mechanical Turk: an assessment of exclusion methods and interactive experiments. Computers in Human Behavior 77, 184–197.CrossRef Google Scholar

Thompson, AI and Busby, EC (2020) Different (race) cards in the deck: directness and denials in racial messaging. Unpublished manuscript.Google Scholar

Vannette, DL and Krosnick, JA (2014) A comparison of survey satisficing and mindlessness. In Ie A, Ngnoumen CT and Langer EJ (eds.), The Wiley Blackwell Handbook of Mindfulness. Malden: Wiley, pp. 312–327.CrossRef Google Scholar

Woon, J (2017) Political Lie detection. Unpublished manuscript. Available at https://rubenson.org/wp-content/uploads/2017/11/woon.pdf.Google Scholar

Zhang, C, Antoun, C, Yan, HY and Conrad, FG (2020) Professional respondents in opt-in online panels: what do we really know? Social Science Computer Review 38, 703–719.CrossRef Google Scholar

Ahler et al. Dataset

Dataset

https://doi.org/10.7910/DVN/KF1LAK

Link

Ahler et al. supplementary material

PDF 1 MB

Article contents

The micro-task market for lemons: data quality on Amazon's Mechanical Turk

Abstract

Keywords

Access options

References

Ahler et al. Dataset

Ahler et al. supplementary material

Save article to Kindle

Save article to Dropbox

Save article to Google Drive

Reply to: Submit a response

Your details

You have entered the maximum number of contributors

Conflicting interests