To save content items to your account,
please confirm that you agree to abide by our usage policies.
If this is the first time you use this feature, you will be asked to authorise Cambridge Core to connect with your account.
Find out more about saving content to .
To save content items to your Kindle, first ensure firstname.lastname@example.org
is added to your Approved Personal Document E-mail List under your Personal Document Settings
on the Manage Your Content and Devices page of your Amazon account. Then enter the ‘name’ part
of your Kindle email address below.
Find out more about saving to your Kindle.
Note you can select to save to either the @free.kindle.com or @kindle.com variations.
‘@free.kindle.com’ emails are free but can only be saved to your device when it is connected to wi-fi.
‘@kindle.com’ emails can be delivered even when you are not connected to wi-fi, but note that service fees apply.
A standard text-as-data workflow in the social sciences involves identifying a set of documents to be labeled, selecting a random sample of them to label using research assistants, training a supervised learner to label the remaining documents, and validating that model’s performance using standard accuracy metrics. The most resource-intensive component of this is the hand-labeling: carefully reading documents, training research assistants, and paying human coders to label documents in duplicate or more. We show that hand-coding an algorithmically selected rather than a simple-random sample can improve model performance above baseline by as much as 50%, or reduce hand-coding costs by up to two-thirds, in applications predicting (1) U.S. executive-order significance and (2) financial sentiment on social media. We accompany this manuscript with open-source software to implement these tools, which we hope can make supervised learning cheaper and more accessible to researchers.
An influential perspective argues that voters use interest group ratings and endorsements to infer their representatives' actions and to hold them accountable. This paper interrogates a key assumption in this literature: that voters correctly interpret these cues, especially cues from groups with whom they disagree. For example, a pro-redistribution voter should support her representative less when she learns that Americans for Prosperity, an economically conservative group, gave her representative a 100 per cent rating. Across three studies using real interest groups and participants' actual representatives, we find limited support for this assumption. When an interest group is misaligned with voters' views and positively rates or endorses their representative, voters often: (1) mistakenly infer that the group shares their views, (2) mistakenly infer that their representative shares their views, and (3) mistakenly approve of their representative more. We call this tendency heuristic projection.
Interest group ideology is theoretically and empirically critical in the study of American politics, yet our measurement of this key concept is lacking both in scope and time. By leveraging network science and ideal point estimation, we provide a novel measure of ideology for amicus curiae briefs and organized interests with accompanying uncertainty estimates. Our Amicus Curiae Network scores cover more than 12,000 unique groups and more than 11,000 briefs across 95 years, providing the largest and longest measure of organized interest ideologies to date. Substantively, the scores reveal that: interests before the Court are ideologically polarized, despite variance in their coalition strategies; interests that donate to campaigns are more conservative and balanced than those that do not; and amicus curiae briefs were more common from liberal organizations until the 1980s, with ideological representation virtually balanced since then.
A single dataset is rarely sufficient to address a question of substantive interest. Instead, most applied data analysis combines data from multiple sources. Very rarely do two datasets contain the same identifiers with which to merge datasets; fields like name, address, and phone number may be entered incorrectly, missing, or in dissimilar formats. Combining multiple datasets absent a unique identifier that unambiguously connects entries is called the record linkage problem. While recent work has made great progress in the case where there are many possible fields on which to match, the much more uncertain case of only one identifying field remains unsolved: this fuzzy string matching problem, both its own problem and a component of standard record linkage problems, is our focus. We design and validate an algorithmic solution called Adaptive Fuzzy String Matching rooted in adaptive learning, and show that our tool identifies more matches, with higher precision, than existing solutions. Finally, we illustrate its validity and practical value through applications to matching organizations, places, and individuals.
Violent protests are dramatic political events, yet we know little about the effect of these events on political behavior. While scholars typically treat violent protests as deliberate acts undertaken in pursuit of specific goals, due to a lack of appropriate data and difficulty in causal identification, there is scant evidence of whether riots can actually increase support for these goals. Using geocoded data, we analyze measures of policy support before and after the 1992 Los Angeles riot—one of the most high-profile events of political violence in recent American history—that occurred just prior to an election. Contrary to some expectations from the academic literature and the popular press, we find that the riot caused a marked liberal shift in policy support at the polls. Investigating the sources of this shift, we find that it was likely the result of increased mobilization of both African American and white voters. Remarkably, this mobilization endures over a decade later.