Political Analysis: Volume 30 - Issue 4

Multi-Label Prediction for Political Text-as-Data
Aaron Erlich, Stefano G. Dantas, Benjamin E. Bagozzi, Daniel Berliner, Brian Palmer-Rubin
Published online by Cambridge University Press:

14 June 2021, pp. 463-480
- Article
- - Get access
    
    Check if you have access via personal or institutional login
    
    Log in Register
- Export citation
Political scientists increasingly use supervised machine learning to code multiple relevant labels from a single set of texts. The current “best practice” of individually applying supervised machine learning to each label ignores information on inter-label association(s), and is likely to under-perform as a result. We introduce multi-label prediction as a solution to this problem. After reviewing the multi-label prediction framework, we apply it to code multiple features of (i) access to information requests made to the Mexican government and (ii) country-year human rights reports. We find that multi-label prediction outperforms standard supervised learning approaches, even in instances where the correlations among one’s multiple labels are low.

Placebo Selection in Survey Experiments: An Agnostic Approach
Ethan Porter, Yamil R. Velez
Published online by Cambridge University Press:

14 June 2021, pp. 481-494
- Article
- - Get access
    
    Check if you have access via personal or institutional login
    
    Log in Register
- Export citation
Although placebo conditions are ubiquitous in survey experiments, little evidence guides common practices for their use and selection. How should scholars choose and construct placebos? First, we review the role of placebos in published survey experiments, finding that placebos are used inconsistently. Then, drawing on the medical literature, we clarify the role that placebos play in accounting for nonspecific effects (NSEs), or the effects of ancillary features of experiments. We argue that, in the absence of precise knowledge of NSEs that placebos are adjusting for, researchers should average over a corpus of many placebos. We demonstrate this agnostic approach to placebo construction through the use of GPT-2, a generative language model trained on a database of over 1 million internet news pages. Using GPT-2, we devise 5,000 distinct placebos and administer two experiments (N = 2,975). Our results illustrate how researchers can minimize their role in placebo selection through automated processes. We conclude by offering tools for incorporating computer-generated placebo text vignettes into survey experiments and developing recommendations for best practice.

Reducing Model Misspecification and Bias in the Estimation of Interactions
Matthew Blackwell, Michael P. Olson
Published online by Cambridge University Press:

23 July 2021, pp. 495-514
- Article
- - Get access
    
    Check if you have access via personal or institutional login
    
    Log in Register
- Export citation
Analyzing variation in treatment effects across subsets of the population is an important way for social scientists to evaluate theoretical arguments. A common strategy in assessing such treatment effect heterogeneity is to include a multiplicative interaction term between the treatment and a hypothesized effect modifier in a regression model. Unfortunately, this approach can result in biased inferences due to unmodeled interactions between the effect modifier and other covariates, and including these interactions can lead to unstable estimates due to overfitting. In this paper, we explore the usefulness of machine learning algorithms for stabilizing these estimates and show how many off-the-shelf adaptive methods lead to two forms of bias: direct and indirect regularization bias. To overcome these issues, we use a post-double selection approach that utilizes several lasso estimators to select the interactions to include in the final model. We extend this approach to estimate uncertainty for both interaction and marginal effects. Simulation evidence shows that this approach has better performance than competing methods, even when the number of covariates is large. We show in two empirical examples that the choice of method leads to dramatically different conclusions about effect heterogeneity.

Generative Dynamics of Supreme Court Citations: Analysis with a New Statistical Network Model
Christian S. Schmid, Ted Hsuan Yun Chen, Bruce A. Desmarais
Published online by Cambridge University Press:

12 July 2021, pp. 515-534
- Article
- - You have access
  - Open access
- PDF
- HTML
- Export citation
The significance and influence of U.S. Supreme Court majority opinions derive in large part from opinions’ roles as precedents for future opinions. A growing body of literature seeks to understand what drives the use of opinions as precedents through the study of Supreme Court case citation patterns. We raise two limitations of existing work on Supreme Court citations. First, dyadic citations are typically aggregated to the case level before they are analyzed. Second, citations are treated as if they arise independently. We present a methodology for studying citations between Supreme Court opinions at the dyadic level, as a network, that overcomes these limitations. This methodology—the citation exponential random graph model, for which we provide user-friendly software—enables researchers to account for the effects of case characteristics and complex forms of network dependence in citation formation. We then analyze a network that includes all Supreme Court cases decided between 1950 and 2015. We find evidence for dependence processes, including reciprocity, transitivity, and popularity. The dependence effects are as substantively and statistically significant as the effects of exogenous covariates, indicating that models of Supreme Court citations should incorporate both the effects of case characteristics and the structure of past citations.

Does Conjoint Analysis Mitigate Social Desirability Bias?
Yusaku Horiuchi, Zachary Markovich, Teppei Yamamoto
Published online by Cambridge University Press:

15 September 2021, pp. 535-549
- Article
- - Get access
    
    Check if you have access via personal or institutional login
    
    Log in Register
- Export citation
How can we elicit honest responses in surveys? Conjoint analysis has become a popular tool to address social desirability bias (SDB), or systematic survey misreporting on sensitive topics. However, there has been no direct evidence showing its suitability for this purpose. We propose a novel experimental design to identify conjoint analysis’s ability to mitigate SDB. Specifically, we compare a standard, fully randomized conjoint design against a partially randomized design where only the sensitive attribute is varied between the two profiles in each task. We also include a control condition to remove confounding due to the increased attention to the varying attribute under the partially randomized design. We implement this empirical strategy in two studies on attitudes about environmental conservation and preferences about congressional candidates. In both studies, our estimates indicate that the fully randomized conjoint design could reduce SDB for the average marginal component effect (AMCE) of the sensitive attribute by about two-thirds of the AMCE itself. Although encouraging, we caution that our results are exploratory and exhibit some sensitivity to alternative model specifications, suggesting the need for additional confirmatory evidence based on the proposed design.

Racing the Clock: Using Response Time as a Proxy for Attentiveness on Self-Administered Surveys
Blair Read, Lukas Wolters, Adam J. Berinsky
Published online by Cambridge University Press:

15 September 2021, pp. 550-569
- Article
- - Get access
    
    Check if you have access via personal or institutional login
    
    Log in Register
- Export citation
Internet-based surveys have expanded public opinion data collection at the expense of monitoring respondent attentiveness, potentially compromising data quality. Researchers now have to evaluate attentiveness ex-post. We propose a new proxy for attentiveness—response-time attentiveness clustering (RTAC)—that uses dimension reduction and an unsupervised clustering algorithm to leverage variation in response time between respondents and across questions. We advance the literature theoretically arguing that the existing dichotomous classification of respondents as fast or attentive is insufficient and neglects slow and inattentive respondents. We validate our theoretical classification and empirical strategy against commonly used proxies for survey attentiveness. In contrast to other methods for capturing attentiveness, RTAC allows researchers to collect attentiveness data unobtrusively without sacrificing space on the survey instrument.

Topics, Concepts, and Measurement: A Crowdsourced Procedure for Validating Topics as Measures
Luwei Ying, Jacob M. Montgomery, Brandon M. Stewart
Published online by Cambridge University Press:

27 September 2021, pp. 570-589
- Article
- - Get access
    
    Check if you have access via personal or institutional login
    
    Log in Register
- Export citation
Topic models, as developed in computer science, are effective tools for exploring and summarizing large document collections. When applied in social science research, however, they are commonly used for measurement, a task that requires careful validation to ensure that the model outputs actually capture the desired concept of interest. In this paper, we review current practices for topic validation in the field and show that extensive model validation is increasingly rare, or at least not systematically reported in papers and appendices. To supplement current practices, we refine an existing crowd-sourcing method by Chang and coauthors for validating topic quality and go on to create new procedures for validating conceptual labels provided by the researcher. We illustrate our method with an analysis of Facebook posts by U.S. Senators and provide software and guidance for researchers wishing to validate their own topic models. While tailored, case-specific validation exercises will always be best, we aim to improve standard practices by providing a general-purpose tool to validate topics as measures.

Adaptive Fuzzy String Matching: How to Merge Datasets with Only One (Messy) Identifying Field
Aaron R. Kaufman, Aja Klevs
Published online by Cambridge University Press:

11 October 2021, pp. 590-596
- Article
- - You have access
  - Open access
- PDF
- HTML
- Export citation
A single dataset is rarely sufficient to address a question of substantive interest. Instead, most applied data analysis combines data from multiple sources. Very rarely do two datasets contain the same identifiers with which to merge datasets; fields like name, address, and phone number may be entered incorrectly, missing, or in dissimilar formats. Combining multiple datasets absent a unique identifier that unambiguously connects entries is called the record linkage problem. While recent work has made great progress in the case where there are many possible fields on which to match, the much more uncertain case of only one identifying field remains unsolved: this fuzzy string matching problem, both its own problem and a component of standard record linkage problems, is our focus. We design and validate an algorithmic solution called Adaptive Fuzzy String Matching rooted in adaptive learning, and show that our tool identifies more matches, with higher precision, than existing solutions. Finally, we illustrate its validity and practical value through applications to matching organizations, places, and individuals.

Choosing Imputation Models
Moritz Marbach
Published online by Cambridge University Press:

10 December 2021, pp. 597-605
- Article
- - Get access
    
    Check if you have access via personal or institutional login
    
    Log in Register
- Export citation
Imputing missing values is an important preprocessing step in data analysis, but the literature offers little guidance on how to choose between imputation models. This letter suggests adopting the imputation model that generates a density of imputed values most similar to those of the observed values for an incomplete variable after balancing all other covariates. We recommend stable balancing weights as a practical approach to balance covariates whose distribution is expected to differ if the values are not missing completely at random. After balancing, discrepancy statistics can be used to compare the density of imputed and observed values. We illustrate the application of the suggested approach using simulated and real-world survey data from the American National Election Study, comparing popular imputation approaches including random forests, hot-deck, predictive mean matching, and multivariate normal imputation. An R package implementing the suggested approach accompanies this letter.

PAN volume 30 issue 4 Cover and Front matter
Published online by Cambridge University Press:

09 September 2022, pp. f1-f3
- Article
- - You have access
- PDF
- Export citation

PAN volume 30 issue 4 Cover and Back matter
Published online by Cambridge University Press:

09 September 2022, pp. b1-b5
- Article
- - You have access
- PDF
- Export citation

Political Analysis

Refine listing

Actions for selected content:

Volume 30 - Issue 4 - October 2022

Article

Multi-Label Prediction for Political Text-as-Data

Placebo Selection in Survey Experiments: An Agnostic Approach

Reducing Model Misspecification and Bias in the Estimation of Interactions

Generative Dynamics of Supreme Court Citations: Analysis with a New Statistical Network Model

Does Conjoint Analysis Mitigate Social Desirability Bias?

Racing the Clock: Using Response Time as a Proxy for Attentiveness on Self-Administered Surveys

Topics, Concepts, and Measurement: A Crowdsourced Procedure for Validating Topics as Measures

Letter

Adaptive Fuzzy String Matching: How to Merge Datasets with Only One (Messy) Identifying Field

Choosing Imputation Models

Front Cover (OFC, IFC) and matter

PAN volume 30 issue 4 Cover and Front matter

Back Cover (OBC, IBC) and matter

PAN volume 30 issue 4 Cover and Back matter

Political Analysis

Refine listing

Actions for selected content:

Save Search

Volume 30 - Issue 4 - October 2022

Article

Letter

Front Cover (OFC, IFC) and matter

Back Cover (OBC, IBC) and matter