INTRODUCTION
The world faces a forcible displacement crisis: more than one hundred million individuals have either fled their countries or been displaced within them.Footnote 1 International displacement has risen dramatically in recent decades due to conflict and political instability in countries from Afghanistan to Venezuela, with Ukraine producing the single largest outflow of refugees in a single year in recorded history.Footnote 2 Forcible displacement has tremendous human costs, and its causes and consequences are the focus of significant social science research. However, empirical studies of refugee flows have been limited by a lack of reliable data.
Existing results are often derived from refugee population stock measures—total end-of-year population counts of refugees within host countries—rather than actual flow estimates. Stock-based results are subject to fundamental questions about internal validity. Much prior research is based on data collected before 2000, when the United Nations Refugee Agency (UNHCR) began systematically tracking refugee and asylum seeker (REF/ASY)Footnote 3 numbers globally. The quality of pre-2000 data is limited, with many missing values that much existing research does not properly account for. Analyses focused on origin countries are often missing data because asylum countries were not then reporting arrivals to the UNHCR; separately, asylum countries that were reporting arrival data often did not collect national origin.
In this study, we seek to address these issues. Following a multiyear collaboration with the UNHCR, culminating in the release of new international displacement flow data (1962–2022),Footnote 4 we reevaluate 28 studies published over decades on the causes and consequences of refugee flows.Footnote 5 Using country-specific reporting timelines, we update these articles’ results to account for possible measurement error introduced when missing values were treated as 0s. We also temporally extend these studies so that results are based on contemporary observations less affected by historical data-quality issues. Our goal is to understand how existing results change when we address these issues—an important question for a body of work with significant influence on academic and policy discourses.
In brief, we observe large inconsistencies between the newly released flow numbers and the stock-based estimates upon which decades of research is based, and we find that the inappropriate treatment of missing historical values is widespread. We produce significantly different results when we replicate the existing literature following three different approaches: first, replacing the old stock-based data with the newly introduced flow data; second, correcting the treatment of missing historical values; and third, temporally extending the studies.
Specifically, we find that in 19 articles on flow causes, $ \approx $ 74 $ \% $ of findings replicate; in 9 articles focused on flow consequences, only $ \approx $ 50 $ \% $ replicate. These percentages are conservative: we assess only whether previously reported results maintain statistical significance and/or whether the sign of estimated coefficients reverses. A stricter standard would also assess substantial changes in magnitude (toward 0), likely driving these percentages lower still.
A subset of our replications are “theory-based”—these consist of reanalyses of studies that focus on refugee flows but adopt some other measure of REF/ASY (e.g., stocks).Footnote 6 These studies contribute substantially to the ongoing debate about the effects of refugees on political violence in host countries—as Savun and Gineste (Reference Savun and Gineste2019, 88) note, “security consequences associated with refugee flows are among the most widely studied aspects of forced migration.” Thus, we view these replications as complements to the original work, offering new empirical insights, and we clearly distinguish in the results which replications are theory-based. In our replications, effects of refugees on security conditions are attenuated, suggesting that the literature’s identification of refugees as sources of violent instability is likely overstated. The contrast between our findings and those based on stocks points to potentially important differences in the effects of refugee inflows versus sustained presence.
The new data also reveal that forced displacement is much more common than reported: Rubin and Moore (Reference Rubin and Moore2007, 91) note that “[f]orced migration is a relatively rare event … around 82 $ \% $ [of country-year cases] experienced no forced migration … .” When we extend this study to 2000–21, we calculate that number to be 22 $ \% $ .Footnote 7
ON THE INTERNAL VALIDITY OF PREVIOUS RESULTS
Refugee Measurement
The UNHCR has tracked REF/ASY flows since 1962. Flow records were used primarily for operational purposes and were not centralized until recently. In 2019, the UNHCR released a draft flow dataset. We engaged in extensive discussions with UNHCR staff about the data, including possible additions/modifications to capture new international movements and apparent inconsistencies across data versions.Footnote 8 We also compared statistical results using redacted and unredacted data versions to help validate the UNHCR’s decision to release only redacted data to protect individual asylum seekers’ identities in cases of very small dyadic flows.
The UNHCR ultimately released the final “Forced Displacement Flow Dataset” in 2022. The new flow data are depicted in Figure 1, with additional details in Appendix A.1.1 of the Supplementary Material.
Flows and Stock-Based Estimates Compared
Actual flows diverge from stock-based estimates for several reasons.Footnote 9 First, researchers estimate flows from stocks as follows, where i denotes either the directed dyad (i.e., sending–receiving country pair) or the asylum/origin country, depending on the unit of interest, and t refers to the year:
Stock-based estimates calculated this way do not account for naturalizations, returns, or resettlements (hereafter “stock departures”); births and deaths; or any other variable affecting host-country stock levels. We find that a substantial number of (directed-dyad) cases involve simultaneous (same-year) stock departures and directed flows. In $ \approx $ 45.40 $ \% $ of asylum country-year observations, inflows and stock departures co-occur; in $ \approx $ 20.81 $ \% $ of cases, stock departures are greater than or equal to inflows (see Figure 2). By capturing new arrivals, the new flow data avoid this issue.
Second, under the stock-based estimation approach, years of “negative” flow are set to 0. We calculate that nearly one-third ( $ \approx 29.68\% $ ) of all first-differenced observations result in negative values that are converted to 0s. In just under half of these directed-dyad-year cases ( $ \approx 48.56\% $ ), the new data report positive values instead.
Third, stock-based flow estimates suffer from major left-censoring. The UNHCR begins tracking REF/ASY arrivals for different countries in different years. Under the first-differences approach, estimates may capture preexisting populations (not inflows) for the first year in which a positive value is reported. To quantify this potential issue, we compare the sum of refugee stocks for all directed-dyad-year observations corresponding to the first year of UNHCR reporting to the sum of the new data’s flows for those same years. Results suggest that many stock-based flow estimates capture preexisting refugee populations rather than new flows—a source of significant potential error in statistical estimates (see Figure 2). Preexisting population values do not enter into the new flow data.
Fourth, until 2007, stock data include population values for third country resettlements, erroneously depicting “flows” into countries where REF/ASY eventually resettled, sometimes years after displacement. The new data prioritize asylum seeker applications to reflect increases in the year of their actual arrival.
Fifth, the stock data include “non-flow increases”: adjustments to stock values due to methodological revision, legislative change, or other host-country changes to how REF/ASY are defined or calculated. These positive reestimations produce apparent flow increases that do not reflect actual new arrivals. In the new flow data, these changes have been removed.Footnote 10
Sixth, stock-based flows lag actual flows in countries that use their asylum systems to grant refugee status; asylum seekers enter into stock data only after their asylum applications have been processed and approved—sometimes years after arrival.Footnote 11 The new flow data prioritize asylum seeker applications to capture movements during the years in which they occurred.
Overall, how do the new flows compare with stock-based estimates? $ {s}_{i,t} $ is a function of stock departures. When stock departures occur simultaneously (within the same year) with inflows, measures of flows are attenuated: $ \widehat{f_{i,t}}\le {f}_{i,t} $ (i.e., the number of stock departures in a given year reduces the calculable inflows by that number). This is consistent with patterns in the data: in $ \approx 81\% $ of directed-dyad-year cases, flow values are strictly larger than stock-based estimates (and larger than or equal in $ \approx 84\% $ of cases). Overall, we calculate that the new flow data capture 14,227,372 more flows than the stock-based data from 1962 to 2022Footnote 12: for every $ \approx $ 5 flows reported under the stock-based approach, the new data report one additional flow.
We directly compare flow values with their stock-based estimates, estimating bias as $ \frac{inflo{w}_{i,t}}{(stoc{k}_{i,t}-stoc{k}_{i,t-1})} $ . Figure 2 displays the distribution of resulting percentages for (a) all asylum-country year observations and (b) all origin-country year observations. Overall, these percentages fall above 100 $ \% $ .Footnote 13 , Footnote 14 In Appendix A.1.3 of the Supplementary Material, we supplement this analysis by reporting for each asylum country the correlation between inflows to that country and stock-based estimates. Results indicate that stock-based estimates tend to significantly underestimate flows; in $ >10\% $ of cases, the two variables are either not correlated or are negatively correlated.
Pre-2000 Data Missingness and Quality Issues
Three major issues are associated with UNHCR data generation and reporting patterns before the year 2000, when the UNHCR standardized approaches to data collection and when many asylum states adopted information and communication technologies that significantly improved reporting.Footnote 15 The empirical problems discussed below persist beyond the year 2000, but are significantly reduced; we use the pre-/post-2000 framing for analytical parsimony.Footnote 16
The first empirical issue is the inappropriate treatment of missing data. Until recently, centralized data on when the UNHCR began tracking REF/ASY in each country were unavailable. In the absence of positive displacement values, many panel datasets set country-/dyad-year observations to 0.Footnote 17 , Footnote 18 While some missing positive refugee values for yearly country/dyadic observations may reflect true 0s, others still reflect positive values that the UNHCR did not collect. Nearly every study we replicated follows the practice of setting such observations to 0 when they precede country-specific data collection timelines. For an asylum-country-year panel dataset 1962–99, this practice results in $ \approx 49.82\% $ of observations being set to 0.Footnote 19
We supplement our analysis with UNHCR-supplied data on centralized collection efforts by country from 1970 on.Footnote 20 Patterns in data collection are depicted in Figure 3. Many countries’ data do not appear in centralized records until long after statistics began to be collected. Before 2000, UNHCR collected asylum seeker data only from several dozen industrialized countries; in 2000, when they centralized data collection, that number jumped to 137 countries, with more countries being added every year.
Using the new flow data, we produce panel datasets with observations set to NA (rather than 0) for years before data collection began.Footnote 21 As we show in Appendix A.3.1 of the Supplementary Material (and in the full set of results posted in a secondary appendix in the Dataverse; see Shaver et al. Reference Shaver, Krick, Blancaflor, Liu, Samara, Yein Ku and Hu2024), this replacement produces additional changes in several results.
The second empirical issue emerges because UNHCR records are mostly constructed from asylum state records: studies using origin-country panel datasets are missing some unknown (potentially very substantial) number of REF/ASY outflows. These missing values were not captured by corresponding inflow data from asylum countries that were not yet reporting data to the UNHCR (see Figure 3). Approximately 68 $ \% $ of the “causes” studies we replicate (and $ \approx $ 40 $ \% $ of all of the studies we replicate) use origin-country panel datasets.
The third empirical issue is that until 2000, a significant amount of UNHCR data for tracked REF/ASY are missing national origin information. For research designs in which REF/ASY origin is relevant, missingness on this variable introduces noise (and potentially bias) to results. We display this pattern in Figure 3.
We cannot directly correct for these final two issues. However, by 2000, these problems are substantially eliminated. For this reason, our analyses include contemporary replications: we extend studies through the most recent date possible and analyze them from the period beginning in 2000. These are our preferred specifications, as all four issues that we raise are either resolved or substantially mitigated.Footnote 22
REPLICATIONS
We used Google Scholar to search general and social science academic journals for articles engaging in quantitative research on global refugee flows. Our search query limited cases to those that (i) reference refugee flows, (ii) include the terms “UNHCR” and “data,”Footnote 23 and (iii) were published by a major publisher,Footnote 24 returning 1,556 responsive articles. We manually inspected each, eliminating studies that (i) did not deal with causes or consequences of refugee flows, (ii) were entirely qualitative, (iii) incorporated data on refugees only as a control or in secondary (tertiary, etc.) analyses, or (iv) were single country studies. This produced 35 qualifying studies. We were unable to obtain replication materials for seven of these. We replicated the remaining 28 by (i) correcting incorrectly imputed zeros and (ii) replacing the old stock-based measures with the new flow data. We assess whether previously reported results maintain statistical significance and/or whether the sign of estimated coefficients reverses.Footnote 25 A detailed description of these articles appears in Appendix A.2.2 of the Supplementary Material; more information on replication procedures is included in Appendix A.2.4 of the Supplementary Material.
By assessing how existing results change when we address the empirical issues described above, this effort falls into the class of “broad” (Dafoe Reference Dafoe2014), “statistical” (Hamermesh Reference Hamermesh2007), or “wide” (Pesaran Reference Pesaran2003) replications involving reestimating test results with the use of new data or related modifications (e.g., adopting alternatively constructed variables). The studies we replicate form the backbone of research on the causes and consequences of refugee flows and have influenced research agendas, curricula, and policy discourses.Footnote 26 Replication results inform causal inferences in cases in which the original authors’ testing strategies were well-identified—save for the empirical corrections we apply—but more generally, they update our understanding of the “published record … recognized [as] state of the art” (King Reference King2006, 119), providing direction for additional scholarly inquiry and the reexamination of their policy implications.Footnote 27
RESULTS AND DISCUSSION
Results are succinctly presented in Tables A1 and A2 in Appendix A.3.1 of the Supplementary Material. Complete replication regression results (alongside original estimates) appear in a supplementary appendix posted to Dataverse (Shaver et al. Reference Shaver, Krick, Blancaflor, Liu, Samara, Yein Ku and Hu2024). Of the 62 total tests from causes articles, $ \approx $ 74 $ \% $ replicate. More significantly, of 20 total tests from consequences articles, only $ \approx $ 50 $ \% $ replicate.Footnote 28 We classify 14 of the 28 (50%) articles we replicate as “plausibly causally identified.”Footnote 29 Of these, $ \approx $ 69% of results on causes replicate and $ \approx $ 50% focused on consequences replicate. This is quantitatively and substantively consistent with the results in our full sample. We present and discuss a select set of results below.
Causes
With respect to the causes of flows, updated study results rarely overturned original findings; however, they frequently supported hypotheses discarded by the original authors as statistically unsupported.
Our findings confirm the central roles played by the “push factors” of political violence and state repression in driving international displacement. We corroborate results that link civil war/insurgency to outflows, estimating larger effects of these factors than did Echevarria and Gardeazabal (Reference Echevarria and Gardeazabal2021), Davenport, Moore, and Poe (Reference Davenport, Moore and Poe2003), and Moore and Shellman (Reference Moore and Shellman2004); we also uncover a larger effect of state repression on outflows than Rubin and Moore (Reference Rubin and Moore2007). Steele (Reference Steele2017, 9) has observed that “current understanding tends to equate wars or violence with an increase in displacement, but we can and need to be more precise.” Our replications amplify her call. Future work might incorporate the new flow data into global analyses of potential heterogeneous effects across factors such as the timing of violence, its spatial distribution and intensity, and the technologies used to perpetrate it.
We also replicated papers focused on “pull factors” that incentivize international over internal displacement and influence the choice of international destination. In replications of Moore and Shellman (Reference Moore and Shellman2004; Reference Moore and Shellman2007) and Turkoglu and Chadefaux (Reference Turkoglu and Chadefaux2019), we find limited support for the idea that refugees are motivated by economic opportunity or democratic institutions in destination countries. This finding contrasts with the framing of asylum seekers as opportunists—as echoed in prominent political discourse. Other results raise tensions warranting further study: for instance, regarding the role of alliance dynamics, we fail to substantiate Moorthy and Brathwaite’s (Reference Moorthy and Brathwaite2019) finding that the presence of formal alliances positively influences dyadic flows. However, whereas Moore and Shellman (Reference Moore and Shellman2007) do not find this, we do.
A more subtle theme of our replications is the under-explored role of factors discouraging or restricting individuals from seeking refuge abroad. On the one hand, some updated findings point to the role of restrictions. In our replication of Echevarria and Gardeazabal (Reference Echevarria and Gardeazabal2021), we estimate larger effects of country size, proximity to potential asylum states, and island status. We estimate a larger effect size than Moore and Shellman (Reference Moore and Shellman2007) of potential asylum state contiguity. On the other hand, we find little connection in the Moore and Shellman (Reference Moore and Shellman2007) replication between conflict and repression in potential asylum states and refugee inflows. These and other such findings encourage further broader inquiry into the set of factors responsible for restricting international displacement—from border securitization (Simmons and Kenwick Reference Simmons and Kenwick2022) to severe weather and natural disasters along border regions under climate change.
Consequences
With respect to the consequences of inflows, we observe significant changes from previous results. In our replications of this seminal literature, the relationship between the arrival of refugees and the onset of war and political violence is attenuated: we find that refugees are only infrequently conduits of violence, and the conditions under which forced displacement poses a risk to host countries appear to be specific.
With respect to refugees’ connection to terrorism, our replications find only partial support for Choi and Salehyan’s (Reference Choi and Salehyan2013) analysis linking these variables. This is consistent with Milton, Spencer and Findley’s (Reference Milton, Spencer and Findley2013) results and our corresponding replication (though we estimate smaller effect sizes). We corroborate Polo and Wucherpfennig’s (Reference Polo and Wucherpfennig2022) causal finding that refugee influx is positively associated with terrorism in the specific case of refugees from communities with ties to transnational terrorist organizations; we find additional evidence that the association for refugees originating from countries without ties is negative. Findings highlight the potential heterogeneous treatment effects of inflows on terrorism, with potential implications for more tailored policy responses and programming.
When we reexamine work on refugees and governments’ respect for human rights, we fail to confirm either of Wright and Moorthy’s (Reference Wright and Moorthy2018) findings: our results indicate neither that an influx of refugees positively influences repression nor that this relationship is moderated by development. We do not recover Chu’s (Reference Chu2020) findings relating to refugees from rival and non-rival origin states and hosts’ respect for human rights.
Finally, when we replicate work linking refugees to inter- and intrastate conflict, our findings do not substantiate Salehyan and Gleditsch’s (Reference Salehyan and Gleditsch2006) seminal research associating refugees to civil war diffusion.Footnote 30 Our results partially support Salehyan’s (Reference Salehyan2008) findings that flows between states can provoke militarized disputes: flows between a given dyad increase the probability that the receiving state initiates a conflict with the sender, but we do not find that it increases the probability of sender-state initiation.
Collectively, these findings speak against the new politics of fear, challenging political narratives that frame refugees as security threats and the restrictive state policies they underpin. Our findings do not indicate that there are no effects of refugee inflows on violent instability, but it seems that refugees play a role in producing or facilitating political violence, wittingly or not, only under particular circumstances. The differences between our findings and those based on refugee stocks highlight potentially important differences between the effects of refugee inflows into a country and the effects of sustained refugee presence. These differences warrant further exploration—particularly because of the “growing difficulty [of] uprooted people … in finding lasting solutions to their plight” (Crisp Reference Crisp2021, 3).
We conclude by noting a publication bias against null results (Esarey and Wu Reference Esarey and Wu2016; Gerber and Malhotra Reference Gerber and Malhotra2008). Our replications sometimes supported hypotheses that were discarded when tested with lower-quality data because they lacked statistical support. Other meaningful relationships relating to refugee flows may have therefore gone undiscovered, which scholars might now retest with these new data.
SUPPLEMENTARY MATERIAL
To view supplementary material for this article, please visit https://doi.org/10.1017/S0003055424000285.
DATA AVAILABILITY STATEMENT
Research documentation and data that support the findings of this study are openly available at the American Political Science Review Dataverse: https://doi.org/10.7910/DVN/JADOZL. Limitations on data availability are discussed in the text and/or Supplementary Material.
AUTHORS CONTRIBUTIONS
The first two authors are the primary authors and are listed in order of contribution. The remaining authors are (former) Political Violence Lab members, listed in order of their respective contributions (and alphabetically where contributions were equal).
Acknowledgements
We are grateful for the opportunity to work with the United Nations Refugee Agency (UNHCR) on its release of the data that enabled this project. We thank the University of California, Merced’s University Library Collection Services for supporting the acquisition of data for this project. The Political Violence Lab, based at the University of California, Merced, acknowledges the support it has received from the University of California, Merced and the University of California Washington Center. We further thank the American Political Science Review editorial team and the three anonymous referees, Lamis Abdelaaty, Kyle Beardsley, Mietek Boduszyński, Alex Bollfrass, Alex Braithwaite, Mateo Villamizar Chaparro, Elaine Denny, James Fearon, Guy Grossman, Biz Herman, Connor Huff, Adam Lichtenheld, Bryce Loidolt, Eric Mvukiyehe, Fouad Pervez, Ryan Powers, David Siegel, Abbey Steele, Yang-Yang Zhou, and participants of Duke University’s Political Economy Seminar Series; Harvard University’s Political Violence Workshop; the Annual Peace Science Society International Conference; and the Graduate Students in International Relations etc. (GSISE) Seminar for comments on this project. For research assistance, we gratefully acknowledge Political Violence Lab interns Mairead Allen, Mia Bartschi, Kai Walter Bauer-Seeley, Isabella Caldarelli, Dheera Dusanapudi, Jasmine Her, Zainab Khan, Qi Zhi (Aaron) Liow, Audrey Lozano, Yasmine Lunar, Colby Mathe, Brian McCarthy, Daniel Romero, and Danielle Startsev.
CONFLICT OF INTEREST
The authors declare no ethical issues or conflicts of interest in this research.
ETHICAL STANDARDS
The authors affirm this research did not involve human participants.
Comments
No Comments have been published for this article.