To save this undefined to your undefined account, please select one or more formats and confirm that you agree to abide by our usage policies. If this is the first time you used this feature, you will be asked to authorise Cambridge Core to connect with your undefined account.
Find out more about saving content to .
To save this article to your Kindle, first ensure firstname.lastname@example.org is added to your Approved Personal Document E-mail List under your Personal Document Settings on the Manage Your Content and Devices page of your Amazon account. Then enter the ‘name’ part of your Kindle email address below.
Find out more about saving to your Kindle.
Note you can select to save to either the @free.kindle.com or @kindle.com variations. ‘@free.kindle.com’ emails are free but can only be saved to your device when it is connected to wi-fi. ‘@kindle.com’ emails can be delivered even when you are not connected to wi-fi, but note that service fees apply.
Political scientists increasingly use supervised machine learning to code multiple relevant labels from a single set of texts. The current “best practice” of individually applying supervised machine learning to each label ignores information on inter-label association(s), and is likely to under-perform as a result. We introduce multi-label prediction as a solution to this problem. After reviewing the multi-label prediction framework, we apply it to code multiple features of (i) access to information requests made to the Mexican government and (ii) country-year human rights reports. We find that multi-label prediction outperforms standard supervised learning approaches, even in instances where the correlations among one’s multiple labels are low.
Although placebo conditions are ubiquitous in survey experiments, little evidence guides common practices for their use and selection. How should scholars choose and construct placebos? First, we review the role of placebos in published survey experiments, finding that placebos are used inconsistently. Then, drawing on the medical literature, we clarify the role that placebos play in accounting for nonspecific effects (NSEs), or the effects of ancillary features of experiments. We argue that, in the absence of precise knowledge of NSEs that placebos are adjusting for, researchers should average over a corpus of many placebos. We demonstrate this agnostic approach to placebo construction through the use of GPT-2, a generative language model trained on a database of over 1 million internet news pages. Using GPT-2, we devise 5,000 distinct placebos and administer two experiments (N = 2,975). Our results illustrate how researchers can minimize their role in placebo selection through automated processes. We conclude by offering tools for incorporating computer-generated placebo text vignettes into survey experiments and developing recommendations for best practice.
Analyzing variation in treatment effects across subsets of the population is an important way for social scientists to evaluate theoretical arguments. A common strategy in assessing such treatment effect heterogeneity is to include a multiplicative interaction term between the treatment and a hypothesized effect modifier in a regression model. Unfortunately, this approach can result in biased inferences due to unmodeled interactions between the effect modifier and other covariates, and including these interactions can lead to unstable estimates due to overfitting. In this paper, we explore the usefulness of machine learning algorithms for stabilizing these estimates and show how many off-the-shelf adaptive methods lead to two forms of bias: direct and indirect regularization bias. To overcome these issues, we use a post-double selection approach that utilizes several lasso estimators to select the interactions to include in the final model. We extend this approach to estimate uncertainty for both interaction and marginal effects. Simulation evidence shows that this approach has better performance than competing methods, even when the number of covariates is large. We show in two empirical examples that the choice of method leads to dramatically different conclusions about effect heterogeneity.
The significance and influence of U.S. Supreme Court majority opinions derive in large part from opinions’ roles as precedents for future opinions. A growing body of literature seeks to understand what drives the use of opinions as precedents through the study of Supreme Court case citation patterns. We raise two limitations of existing work on Supreme Court citations. First, dyadic citations are typically aggregated to the case level before they are analyzed. Second, citations are treated as if they arise independently. We present a methodology for studying citations between Supreme Court opinions at the dyadic level, as a network, that overcomes these limitations. This methodology—the citation exponential random graph model, for which we provide user-friendly software—enables researchers to account for the effects of case characteristics and complex forms of network dependence in citation formation. We then analyze a network that includes all Supreme Court cases decided between 1950 and 2015. We find evidence for dependence processes, including reciprocity, transitivity, and popularity. The dependence effects are as substantively and statistically significant as the effects of exogenous covariates, indicating that models of Supreme Court citations should incorporate both the effects of case characteristics and the structure of past citations.
How can we elicit honest responses in surveys? Conjoint analysis has become a popular tool to address social desirability bias (SDB), or systematic survey misreporting on sensitive topics. However, there has been no direct evidence showing its suitability for this purpose. We propose a novel experimental design to identify conjoint analysis’s ability to mitigate SDB. Specifically, we compare a standard, fully randomized conjoint design against a partially randomized design where only the sensitive attribute is varied between the two profiles in each task. We also include a control condition to remove confounding due to the increased attention to the varying attribute under the partially randomized design. We implement this empirical strategy in two studies on attitudes about environmental conservation and preferences about congressional candidates. In both studies, our estimates indicate that the fully randomized conjoint design could reduce SDB for the average marginal component effect (AMCE) of the sensitive attribute by about two-thirds of the AMCE itself. Although encouraging, we caution that our results are exploratory and exhibit some sensitivity to alternative model specifications, suggesting the need for additional confirmatory evidence based on the proposed design.
Internet-based surveys have expanded public opinion data collection at the expense of monitoring respondent attentiveness, potentially compromising data quality. Researchers now have to evaluate attentiveness ex-post. We propose a new proxy for attentiveness—response-time attentiveness clustering (RTAC)—that uses dimension reduction and an unsupervised clustering algorithm to leverage variation in response time between respondents and across questions. We advance the literature theoretically arguing that the existing dichotomous classification of respondents as fast or attentive is insufficient and neglects slow and inattentive respondents. We validate our theoretical classification and empirical strategy against commonly used proxies for survey attentiveness. In contrast to other methods for capturing attentiveness, RTAC allows researchers to collect attentiveness data unobtrusively without sacrificing space on the survey instrument.
Topic models, as developed in computer science, are effective tools for exploring and summarizing large document collections. When applied in social science research, however, they are commonly used for measurement, a task that requires careful validation to ensure that the model outputs actually capture the desired concept of interest. In this paper, we review current practices for topic validation in the field and show that extensive model validation is increasingly rare, or at least not systematically reported in papers and appendices. To supplement current practices, we refine an existing crowd-sourcing method by Chang and coauthors for validating topic quality and go on to create new procedures for validating conceptual labels provided by the researcher. We illustrate our method with an analysis of Facebook posts by U.S. Senators and provide software and guidance for researchers wishing to validate their own topic models. While tailored, case-specific validation exercises will always be best, we aim to improve standard practices by providing a general-purpose tool to validate topics as measures.
A single dataset is rarely sufficient to address a question of substantive interest. Instead, most applied data analysis combines data from multiple sources. Very rarely do two datasets contain the same identifiers with which to merge datasets; fields like name, address, and phone number may be entered incorrectly, missing, or in dissimilar formats. Combining multiple datasets absent a unique identifier that unambiguously connects entries is called the record linkage problem. While recent work has made great progress in the case where there are many possible fields on which to match, the much more uncertain case of only one identifying field remains unsolved: this fuzzy string matching problem, both its own problem and a component of standard record linkage problems, is our focus. We design and validate an algorithmic solution called Adaptive Fuzzy String Matching rooted in adaptive learning, and show that our tool identifies more matches, with higher precision, than existing solutions. Finally, we illustrate its validity and practical value through applications to matching organizations, places, and individuals.
Imputing missing values is an important preprocessing step in data analysis, but the literature offers little guidance on how to choose between imputation models. This letter suggests adopting the imputation model that generates a density of imputed values most similar to those of the observed values for an incomplete variable after balancing all other covariates. We recommend stable balancing weights as a practical approach to balance covariates whose distribution is expected to differ if the values are not missing completely at random. After balancing, discrepancy statistics can be used to compare the density of imputed and observed values. We illustrate the application of the suggested approach using simulated and real-world survey data from the American National Election Study, comparing popular imputation approaches including random forests, hot-deck, predictive mean matching, and multivariate normal imputation. An R package implementing the suggested approach accompanies this letter.