1 Introduction
The first step in any scientific research is understanding the state of relevant existing knowledge. “Literature reviews” range from simple expositions of past work, to critical analysis, to identifying intellectual communities or schools of thought. No matter the style, the goal is typically the same: identifying what is known, assessing what is unknown, and suggesting paths for productive research (Knopf Reference Knopf2006). As central as literature reviews are to the research process, there is surprisingly little guidance on how to write them in political science (Knopf Reference Knopf2006). What constitutes a good summary of prior research?
1.1 What a Review Should Include
Ideally, a scholar draws on expert knowledge and a systematic search of published work to identify a literature (McGhee Reference McGhee2020), sifting through and identifying all relevant prior studies, organizing them using a schema (perhaps conceptual, theoretical, empirical, or chronological), distinguishing what is known from what is unknown or disputed, and ending with research questions.
More commonly, however, literature reviews are ad hoc, with writers unconsciously applying heuristics such as familiarity, citation patterns, and author prestige to signal relevant work. Such methods can reproduce existing biases that can exclude work written by women or minorities (Dion, Sumner, and Mitchell Reference Dion, Sumner and Mitchell2018). While transparency in inclusion criteria is essential (Snyder Reference Snyder2019), few papers in political science discuss methods of literature review, much less inclusion criteria.
Motivated by a need for transparent approaches and methods to uncover broad patterns in literature reviews, we offer a network-based framework for reviews. The core components of a classic social network are simple: a set of actors (nodes: e.g., individuals and organizations), any two of whom can be connected through a social interaction (edges: e.g., friendship and co-occurrence). As social networks help conceive of actors and their relationships with one another, so too can a body of scientific knowledge be organized as a network of concepts (nodes) and theorized relationships among them (edges); importantly, social networks are analyzed for their entire structures and patterns across them, which map nicely to questions we can pose about patterns of studied concepts and their relationships.
This framework is helpful on several fronts. First, it provides greater transparency in the choices of work comprising the “universe” of relevant prior knowledge—that is, defining a network clarifies the population, sampling approach, and sample the review includes. Second, network graphs can display broad and complex patterns among concepts that may not appear to human imagination alone. We show that network visualization and simple network statistics clarify gaps in theory and theory testing. Third, our approach makes evidence for assertions about prior knowledge easy to produce, critique, replicate, and extend. Fourth, a network of concepts is an accessible first step toward causal graphs used in causal identification.
Any review of research is as biased as the sample of work that comprises it. While our approach cannot estimate biases, clarifying standards for inclusion may diminish the role of heuristics based on familiarity and prestige. Moreover, patterns revealed in a network may draw attention to new research areas. Explicating assertions about gaps in knowledge may also reduce reliance on conventional wisdom and the likelihood of overlooking contributions from less cited or well-networked research, including work by underrepresented minority and junior scholars. While quantifications of human biases in reviewership are inherently difficult to measure, we conduct an illustrating set of experiments of biased sampling drawn from a fixed corpus and present descriptive findings on preserved and lost aspects of network structure in such a process in Section B of the Supplementary Material.
1.1.1 Questions Asked in Classical Literature Reviews
We begin with common questions asked in literature reviews, summarized in Table 1. We show how these questions correspond to features of networks (shown in column 2, network questions) captured by network statistics (column 3), and distinguish questions that identify and assess literature, explore possibilities to contribute, and isolate and control conceptual relationships for causal theory-building.
1.1.2 Assess and Identify
This group of questions includes summaries of research and offers a sense of existing knowledge—key parts of a literature review identified in Knopf (Reference Knopf2006). Summarizing pairs (“dyads”) of studied concepts is straightforward. Global patterns of connections are harder to contemplate without visual aids. Visualizing a network constructed from concept dyads offers a useful “global summary” of prior literature. Identifying key concepts similarly translates to finding central nodes. Finally, identifying clusters of related studies is to discover “network communities,” the aim of community detection algorithms (Yang, Algesheimer, and Tessone Reference Yang, Algesheimer and Tessone2016).
1.1.3 Explore
Reviews often aim to identify “gaps in the literature:” under-theorized relationships among concepts or theorized relationships that lack empirical validation (Table 1, “Explore”). Exploration questions are difficult to answer without systematic accumulation and organization of the literature, as they inherently require isolating “missing” links. A network framework makes finding missing links straightforward. Exploring network components that lack theoretical ties reveals opportunities to link communities of concepts—and the scholars who study them—together. Similarly, if prior work theorizes that concept A affects concept B (an edge between A and B), and other work demonstrates that B affects C (edge from B to C), but no work exists that discusses the effects of A on C, this appears in the network as a missing edge between A and C, suggesting opportunities for theorizing.
1.1.4 Isolate and Control
Finally, a network approach can clarify causal relationships and potential confounding pathways. In a network, existing bodies of scholarship form the local neighborhood of a given concept (node). Scholarship on causal relationships among concepts linked to both independent and dependent concepts may reveal confounding causal pathways.
We next present an example application reviewing the literature on redistricting guided by questions drawn from Table 1.
2 Application to Redistricting
To illustrate the method, we imagine ourselves as a researcher new to studying redistricting and conducting a literature review. Figure 1 summarizes steps we take as an example for researchers interested in this approach. Redistricting following the 2020 U.S. Census attracted the attention of courts, politicians, and the public, to the prior decade of academic work, highlighting the importance of district boundaries to political outcomes. Understanding this literature poses a challenge for new scholars. We focus on work over the last 10 years as a way to demarcate the “latest research” (Dion et al. Reference Dion, Sumner and Mitchell2018).
What constitutes the relevant literature? We recommend selection criteria based on predefined and replicable rules. Our criteria prioritize recent and impactful work on redistricting, indicated by journal rankings and citations. We select six highly ranked political science journals broad enough in scope to cover the topic of redistricting (Scimago 2020), two journals specifically from the American politics subfield, and finally, Election Law Journal, which has systematically published redistricting research cited by courts, expert witnesses, and government entities. Within these journals, we conduct keyword searches among articles published since 2010 containing any of the following phrases in either title or abstract: efficiency gap, gerrymander, partisan symmetry, and redistrict. To capture relevant work outside these journals, we search Google Scholar for any post-2010 peer-reviewed article with 50 or more citations that included a key phrase in the title/abstract. One hundred fifteen articles matched these criteria, constituting the corpus of studies for this review.Footnote 1 Keyword selection necessarily affects article selection. For a comprehensive search that balances exploration and “starting values,” we recommend an iterative process of selecting seeding keywords to survey the literature and snowball-sample highly related keywords (e.g., searching articles with keyword redistricting returns articles that regularly speak of gerrymandering and efficiency gap) and soliciting keyword suggestions from domain area experts.
For each article, network data is input through familiar steps: reading the work and identifying the main concepts and connections posited between concepts. We select concepts representing the main causes and effects investigated, often concepts discussed in the abstract or theory sections. We enter this information into an edgelist spreadsheet—such that concepts constitute nodes, and their connections are edges, often hypothesized with directions so the edge can be drawn as an arrow from cause to effect.Footnote 2 For example, the two main concepts in Cain et al. (Reference Cain, Tam Cho, Liu and Zhang2017) are “independent redistricting commissions” and “partisan advantage.” The authors further posit that such commissions are unlikely to produce extremely partisan maps, which we record as a directed edge. Information pertinent to this edge—such as whether the effect is positive or negative—and the number and identity of works that address this same connection are edge attributes. Attention must be paid to the important process of defining nodes and edges—no surprise to regular users of network analysis or analyses that rest on well-measured concepts—how and what constitutes a node that represents a concept and whether it relates to another is ultimately an interpretive process by the researcher from research piece to row in an edgelist.Footnote 3 Authors may use different language in referencing the same concept (i.e., partisan bias and partisan advantage); in these cases, we employ an iterative concept-naming process. We add terms used by each author to the spreadsheet, then visualize the draft network to identify similar terms. After consulting the relevant articles, we consolidate terms referring to the same concept under an umbrella term in the spreadsheet.
In this example, the final spreadsheet contains 57 concept nodes and 69 edges describing relationships among concepts.
Figure 2 shows the resulting redistricting literature network. A global summary of the literature begins with describing the network itself—57 theoretical concepts studied, as causes or effects, shaded by the node’s total degree centrality.Footnote 4 Sixty-nine edge arrows show each directional relationship explicitly theorized in our corpus, colored by number of publications addressing that relationship. Edge colors can illustrate attributes of relationships supplied as an additional column in the input spreadsheet (here representing the number of citations that theorized the edge relationship). The $\textit{netlit}$ vignette presents attributes that one may wish to highlight, including edge statistics (e.g., edge betweenness) produced by $\textit{netlit::review()}$ . Literature discussing the measurement of a single concept appears as self-ties. For example, measuring the concept of compactness (right-hand side of Figure 2) has inspired a series of works (Barnes and Solomon Reference Barnes and Solomon2021; Chen and Rodden Reference Chen and Rodden2015; De Assis, Franca, and Usberti Reference De Assis, Franca and Usberti2014; Magleby and Mosesson Reference Magleby and Mosesson2018; Saxon Reference Saxon2020; Tam Cho and Liu Reference Tam Cho and Liu2016).
What are key concepts in redistricting literature? As posed in Table 1 “Assess and Identify,” a natural translation of this question to a network is “what are the central nodes?” The concept of partisan advantage is most central with 14 total edges. Efficiency gap, partisan gerrymandering, and preserve communities of interest each have degree centrality of eight. We define preserve communities of interest as an umbrella term covering the broad goals of preservation of minority areas and political subdivisions within districts, and core district retention (Figure 3a). It is unsurprising that this predominantly legal concept scores high in total degree (five out edges, one in-edge, and one self-tie) as it is both a traditional redistricting criterion and widely studied.
Are there communities of work that have developed recently? In network terms, we can ask: “what are communities in the network?” A distinct community (Figure 3a) of scholars has investigated how changes in the electorate’s composition (change in constituency boundaries) can affect downstream campaign resource allocations and vote power, highlighting the effects that redrawing of maps might have on political environments for individual candidates.
Beyond assessing the state of the literature, two defining tasks in research are finding areas for theory building and identifying where theory has yet to be empirically tested; that is, finding the “gaps” in knowledge. A visual approach to the first question looks for areas in the literature network where two concepts are discussed separately and where a researcher might posit a connection. For example, recent work (Ansolabehere and Snyder Jr. Reference Ansolabehere and Snyder2012; Carsey, Winburn, and Berry Reference Carsey, Winburn and Berry2017; Hood and McKee Reference Hood and McKee2013; Limbocker and You Reference Limbocker and You2020) has shown that redrawn lines that change the composition of the electorate exert an exogenous effect on the vote, but less attention has been paid to how such changes affect minority representation; there exists an opportunity for research concerned with political consequences of constituency boundaries to engage more directly with scholarship on minority representation.
Answering questions about empirical gaps is a simple matter of analyzing edge characteristics in the network—whether concepts that are linked in theory are also linked in empirical work. In Figure 2 edges drawn as dashed lines indicate a theorized but not empirically validated relationships. Solid lines represent empirically demonstrated connections between concepts. Partisan advantage is hypothesized to affect whether floor votes align with district/state preferences (bottom of Figure 2)—both through connections that remained untested empirically until recently (Caughey, Xu, and Warshaw Reference Carsey, Winburn and Berry2017). Likewise, Figure 2 suggests equal population and partisan dislocation are concepts that are important to measure (visually verified with self-ties). Measuring equal population has been discussed more often than partisan dislocation (Gatesman and Unwin Reference Gatesman and Unwin2021; Magleby and Mosesson Reference Magleby and Mosesson2018), which is reasonable given that equipopulation is a long-standing and firm legal rule in redistricting; whereas the latter concept is relatively new (DeFord, Eubank, and Rodden Reference DeFord, Eubank and Rodden2021).
For researchers interested in causal relationships, a network approach offers two tools for isolating and controlling, potentially the first step toward a more complete directed analytic graph. To answer the question “what causal pathways are related to a theorized concept?”, we inspect the neighborhood of nodes and edges. Consider the node preserve communities of interest. Exploring its neighborhood (Figure 3a) reveals hypothesized downstream effects on preserving community interests, including changes at the voter (voter information about their district and stability in voters’ fellow constituents) and district levels (partisan gerrymandering and rolloff). It also suggests that preserving communities of interest is a prominent confounding concept that affects how voter information about their district contributes to rolloff.
How does a network approach differ from a traditional expert-guided review? As a thought exercise, one of our team members (Mayer)—a redistricting scholar and experienced expert witness in gerrymandering litigation—prepared a traditional review. We compare our approach against his and McGhee (Reference McGhee2020)’s recent redistricting literature review. Mayer and McGhee separately identify three key themes in the recent redistricting literature that parallel our network findings: developing metrics, automation of redistricting methods, and exploring downstream effects of gerrymandering. The network approach brings some nuance to each of these themes, however, by allowing quick identification of metric-oriented works, avoiding over-inflating the importance of growing communities of work, and allowing us to develop more complex directed acyclic graphs from the literature around the effects of gerrymandering.
Specifically, Mayer and McGhee note that recent work has focused on developing metrics to propose a legal standard for federal courts to place limits on partisan plans (and which Justice Anthony Kennedy appeared to request in LULAC v. Perry 584 U.S.399 (2004)).Footnote 5 Our network approach also identifies scholarship on metrics, represented as nodes with self-ties. Further, it parses out where scholarship on metrics is more or less likely to contribute to theories of redistricting. For example, measures of compactness versus equal population both have self-ties, but the network shows that only the former has been recently theorized to affect political outcomes such as voter turnout.
Similarly, Mayer notes that automated redistricting methods have captured substantial attention recently (Chen and Rodden Reference Chen and Rodden2013; Cho and Liu Reference Cho and Liu2018; Liu, Cho, and Wang Reference Liu, Cho and Wang2016; Magleby and Mosesson Reference Magleby and Mosesson2018; Vanneschi, Henriques, and Castelli Reference Vanneschi, Henriques and Castelli2017). One method draws large numbers of maps with different decisions rules and initial conditions, with the resulting maps used to identify outliers that indicate partisan gerrymanders or possible “natural” gerrymanders (Cain et al. Reference Cain, Tam Cho, Liu and Zhang2017; Chen Reference Chen2017; Chen and Cottrell Reference Chen and Cottrell2016; Chen and Rodden Reference Chen and Rodden2013, Reference Chen and Rodden2015; Fifield et al. Reference Fifield, Higgins, Imai and Tarr2020; Ramachandran and Gold Reference Ramachandran and Gold2018; Tam Cho and Liu Reference Tam Cho and Liu2016). The network figure shows these studies with self-tying nodes and node connections, including the relationship between geographic partisan distribution and partisan advantage. Both expert reviews emphasized methodological advancements. While this strand of work is prominent in the full network graph, it is a minority of scholarship that is still primarily concerned with political science theories related to redistricting. Thus, we see our approach as avoiding the conflation of overall patterns of scholarship with popular and highly discussed work. The latter would be better captured with a citation network than a causal graph.
The third insight of a traditional literature review is that recent work has continued exploring the effects of gerrymandering on various outcomes including incumbency advantage (Henderson, Hamel, and Goldzimer Reference Henderson, Hamel and Goldzimer2018); electoral competition (Cottrell Reference Cottrell2019); candidate quality and emergence (Williamson Reference Williamson2019); roll-call voting and state policy (Caughey, Tausanovitch, and Warshaw Reference Caughey, Xu and Warshaw2017); political parties (Stephanopoulos and Warshaw Reference Stephanopoulos and Warshaw2020); campaign contributions (Crespin and Edwards Reference Crespin and Edwards2016); and constituent access (Niven, Cover, and Solimine Reference Niven, Cover and Solimine2021). Our network also captures these relationships as dyadic connections, but can further illuminate downstream causal chains, confounding concepts, and multiple causal paths.
3 Discussion and Conclusion
We present an organizing framework based on network representations to conduct literature reviews. Our application focused on redistricting, but the approach is general; where research builds on complex combinations of prior work, a network approach might prove especially fruitful.
We highlight several helpful features of networks as a way of uncovering patterns in scholarship. Beyond assessing prominent themes and communities of work, the network representation most importantly lends itself to theoretical exploration and identification of relationships that have yet to be studied empirically. Finally, the directed graph representations in this framework can be used to inspect causal pathways related to a concept or to identify confounding relationships (see, for instance, discussion of causal interpretations in regression models in (Keele, Stevenson, and Elwert Reference Keele, Stevenson and Elwert2020)).
Our approach may also lower barriers to entry: while substantive expertise always improves exercises like these, the input units to the network require summarizing concepts and identifying posited relationships between them within single research works, repeating this over the list of works, and submitting this information into a spreadsheet. This process is accessible to newcomers to a literature. Illustrated in the $netlit$ vignette is another pattern-discovering tool for reviewing literature evolution—by subsetting the input data to prior periods and comparing the generated literature network to the most complete and up-to-date network.
In emphasizing the importance of clearly delineating inclusion criteria for work included in a literature review, our approach may also limit unintentional biases in the process of assembling “relevant” works for literature reviews (e.g., favoring personal or institutional social networks, running the risk of over-representation of well-connected works at the expense of research from underrepresented scholars (Lalanne and Seabright Reference Lalanne and Seabright2022)). While our approach does not eliminate systemic under-representation, we hope that clear criteria and full visualization of included work can sidestep under-representation and under-inclusion.
Ultimately, the proposed framework still relies on researcher choices—including the universe of sources, selection criteria, and identification of main concepts—choices that are still undertaken in traditional reviews of literature. By clarifying choices and utilizing our network framework to visualize the resulting network, we expect it to be easier to evaluate such choices and how sensitive assertions about gaps in the literature are to these choices.
Acknowledgments
The authors benefited from important preliminary discussions with and feedback from Jonathan Renshon, as well as detailed comments from Héctor Pifarré i Arolas, Eleanor Powell, attendees of the PolMeth 2022 Summer Meeting, and anonymous reviewers at PA.
Data Availability Statement
Replication code for this article is available and has been published in Code Ocean, a computational reproducibility platform that enables users to run the code, and can be viewed interactively at the following DOI: 10.24433/CO.0502881.v1 (Lo et al. Reference Lo, Judge-Lord, Hudson and Mayer2023a). A preservation copy of the same code and data can also be accessed via Dataverse at https://doi.org/10.7910/DVN/NV66YN (Lo et al. Reference Lo, Judge-Lord, Hudson and Mayer2023b). A copy of the R package, code, and data can also be accessed via Github at https://judgelord.github.io/netlit.
Supplementary Material
For supplementary material accompanying this paper, please visit https://doi.org/10.1017/pan.2023.4.
Funding
This work was supported by the University of Wisconsin–Madison Office of the Vice Chancellor for Research and Graduate Education with funding from the Wisconsin Alumni Research Foundation.