Published online by Cambridge University Press: 13 June 2011
This article offers the first independent scholarly evaluation of the claims, forecasts, and causal inferences of the State Failure Task Force and its efforts to forecast when states will fail. State failure refers to the collapse of the authority of the central government to impose order, as in civil wars, revolutionary wars, genocides, politicides, and adverse or disruptive regime transitions. States that sponsor terrorism or allow it to be organized within their borders are all failed states. This task force, set up at the behest of Vice President Gore in 1994, has been led by a group of distinguished academics working as consultants to the U.S. CIA. State Failure Task Force reports and publications have received attention in the media, in academia, and from public decision makers. The article identifies several methodological errors in the task force work that cause its reported forecast probabilities of conflict to be too large, its causal inferences to be biased in unpredictable directions, and its claims of forecasting performance to be exaggerated. However, the article also finds that the task force has amassed the best and most carefully collected data on state failure to date, and the required corrections provided in this article, although very large in effect, are easy to implement. The article also demonstrates how to improve forecasting performance to levels significantly greater than even corrected versions of its models. Although the matter is still a highly uncertain endeavor, the authors are nevertheless able to offer the first accurate forecasts of state failure, along with procedures and results that may be of practical use in informing foreign policy decision making. The article also describes a number of strong empirical regularities that may help in ascertaining the causes of state failure.
* The authors have no formal or informal relationship with the State Failure Task Force or the task force's sponsor, the U.S. Central Intelligence Agency. We thank Matt Baum for research assistance; Jim Alt, Aslaug Asgeirsdottir, Bob Bates, Ben Bishin, Lee Epstein, Jim Fearon, Charles Franklin, Jeff Frieden, Kristian Gleditsch, Jack Goldstone, David Laitin, Chris Murray, Kevin Quinn, Ken Scheve, Alan Stain, Ben Valentino, Jonathan Wand, and Mark Woodward for helpful discussions; the State Failure Task Force for collective written comments; Bob Bates for his suggestion that we take on this project; and the National Science Foundation (SBR-9729884, SBR-9753126, and IIS-9874747), the National Institutes of Aging, and the World Health Organization for research support. For providing us copy of the task force data, we thank task force members Bob Bates and Monte Marshall. All data referenced in this article are available at http://gking.harvard.edu, and for making this pos sible we are thankful for the guidance and assistance of Dick Cooper and the efforts of attorneys Kim Budd, Bob Donin, Allan Ryan, and Bob Iulioano in the Harvard University General Council's office.
1 Esty, Daniel C., Goldstone, Jack, Gurr, Ted Robert, Surko, Pamela T., and Unger, Alan N., Working Papers: State Failure Task Force Report (McLean, Va.: Science Applications International Corporation, 1995)Google Scholar; Esty, Daniel C., Goldstone, Jack, Gurr, Ted Robert, Harff, Barbara, Surko, Pamela T., Unger, Alan N., and Chen, Robert S., The State Failure Task Force Report: Phase II Findings (McLean, Va.: Sci ence Applications International Corporation, 1998)Google Scholar.
2 Esty, Daniel C., Goldstone, Jack, Gurr, Ted Robert, Harff, Barbara, Surko, Pamela T., Unger, Alan N., and Chen, Robert S., “The State Failure Project: Early Warning Research for U .S. Foreign Policy Planning,” in Davies, John L. and Gurr, Ted Robert, eds., Preventive Measures: Building Risk Assessment and Crisis Early Warning Systems (Lanham, Md.: Rowman and Littlefield, 1998)Google Scholar; Esty, Daniel C., Goldstone, Jack, Gurr, Ted Robert, Harff, Barbara, Levy, Marc, Dabelko, Geoffrey D., Surko, Pamela T., and Unger, Alan N., “The State Failure Report: Phase II Findings,” Environmental Change and Security Project Report 5 (Summer 1999)Google Scholar.
3 For example, Tim Zimmermann, “CIA Study: Why Do Countries Fall Apart? Al Gore Wanted to Know,” U.S. News and World Report, March 12, 1996.
4 Esty et al. (fn. 2,1998), 27-38; and, e.g., John C. Gannon, The GlobalInfectious Disease Threat and Its Implicationsfor the United States (U.S. National Intelligence Council, http://www.cia.gov/cia/publications/nie/report/nie99-17d.html, 2000).
5 Esty et al. (fn. 2,1998), 27-38.
6 Fewer than 195 countries appear in the data set in any one year. For example, Germany, East Germany, and West Germany are three separate items in this count, even though for any one year in the data set, either Germany or East and West Germany appear. Countries enter the data set in 1955 or when they first came into existence if later; countries remain in the data set after an episode of failure. In addition, the task force was required by the U.S. government to omit the United States from all analyses. They also omitted countries with fewer than half a million people.
7 Breslow, Norman E., “Statistics in Epidemiology: The Case-Control Study,” Journal of the American Statistical Association 91 (March 1996)CrossRefGoogle Scholar; Gary King and Langche Zeng, “Explaining Rare Events in International Relations,” International Organization (forthcoming), preprint at http://gking.harvard.edu.
8 We focus only on the task force's so-called “global model.” Its data set includes 1, 231 variables, although many of these are recodes of other variables or markers of problems with individual observations. Although the task force writings indicate that it used only the case-control data, its data set contains at least some information and always Yr for every country.
9 Estyetal. (fn.2,1998).
10 Mike West, Joseph R. Nevins, Jeffrey R. Marks, Rainer Spang, and Harry Zuzuan, “Bayesian Regression Analysis in the ‘Large p, Small n’ Paradigm with Application in DNA Microarray Studies” (Manuscript, Duke University, 2000).
11 For example, King, Gary, Keohane, Robert O., and Verba, Sidney, Designing Social Inquiry: Scientific Inference in Qualitative Research (Princeton: Princeton University Press, 1994)Google Scholar.
12 King and Zeng (fn. 7).
13 Let Xbe a vector of k explanatory variables, including a constant term, and Xo and Xx each denote 1 × k vectors of values of the explanatory variables (e.g., with one variable changing and the others remaining constant at their medians between X0 and X1). Quantities of interest usually include raw probabilities of failure, relative risks, and first differences. The first difference, Pr(Y = 1 |X 1) - Pr(Y = 1|X 0), is the increase in probability, and the relative risk, Pr(Y = 1|X 1)/Pr(Y = 1|X 0), is the factor by which the probability increases, when the explanatory variables change from X 0 to X 1.
In one special case, the relative risk can be approximated indirectly without the constant term via an odds ratio, which in logit is a function of the slopes only. However, this approximation is accurate only as τ → 0, which is the assumption that no state is ever at risk of failure, in which case there would not be much point in forecasting state failure in the first place (although the bias can be small if τ is very small). In addition, the assumption implies implausibly that Pr(Y = 1|X) = 0 for any X and that all first differences are 0. For details, see Gary King and Langche Zeng, “Estimating Risk and Rate Levels, Ratios, and Differences in Case-Control Data,” Statistics in Medicine (forthcoming), preprint at http://gking.harvard.edu.
14 All country-year predictions were highly biased. For this illustration, we chose a few cases that might be familiar and a few that were less familiar. Readers can easily compute the bias in all other country-years using our methods, a hand calculator, and their tables.
15 Esty et al. (fn. 1,1998), Table A-7.
17 We computed these via prior correction from numbers given in Esty et al. (fn. 1,1998), by using equation 26 in Gary King and Langche Zeng, “Logistic Regression in Rare Events Data,” Political Analysis (forthcoming), preprint at http://gking.harvard.edu. Since the raw data are not needed for this calculation, we stuck to the published version, which was based on a data set that differed slightly from the updated one we used. We also reproduce this with the new data, and there were only very minor differences.
18 In our discussions with the task force, we learned that they sometimes estimated relative risks in Table 3 indirectly and approximately via an odds ratio (where prior correction is unnecessary; see fn. 13), rather than directly and without prior correction, as assumed here. The indirect approach is also biased except when the expected population of failures becomes 0. The indirect approximation (and even the phrase “odds ratio”) is never mentioned in the task force reports or other publications, but if the task force had used it for its written work, then its relative risk estimates computed from the logistic regression in Table 1 are more accurate than indicated in our Table 3. However, the task force estimates of relative risks, such as those computed from the probabilities in Table 2 and described in the text above, would be as biased, and their estimates of probabilities and first differences would be considerably less accurate than we indicate.
19 For example, Ripley, B. D., Pattern Recognition and NeuralNetworks (New York: Cambridge University Press, 1996)Google Scholar.
20 Esty et al. (fn. 1, 1998) report using a 0.26 cutting point, and they use 0.25 in their new data (which may conceivably indicate that they intended, although failed, to assign C = 3) (p. 57). This, by applying equation 26 from King and Zeng (fn. 17), translates to 0.01634 and thus implies that C = 1/0.01634-1 = 60.2.
21 Of course, from the perspective of the people in countries at high risk, C = 60 might even be too small. A very useful future project would be to survey policymakers to measure their values for C. In all probability, Cvaries to some extent over people, countries, and time, but there surely are some patterns that would be helpful in evaluating future forecasting efforts.
24 Esty et al. (fn. 2,1998), 27-38.
25 King, Gary, Honaker, James, Joseph, Anne, and Scheve, Kenneth, “Analyzing Incomplete Political Science Data: An Alternative Algorithm for Multiple Imputation,” American Political Science Review 95 (March 2001)Google Scholar.
26 Esty et al.(fn. 1, 1998), 29.
27 Rubin, Donald, Multiple ImputationsforNonresponse in Surveys (New York: Wiley Press, 1996)Google Scholar; King et al. (fn. 25).
28 James Honaker, Anne Joseph, Gary King, and Kenneth Scheve, “Amelia: A Program for Missing Data” (http://gking.harvard.edu, 2000).
29 Gurr, Ted Robert, “Why Minorities Rebel: A Global Analysis of Communal Mobilization and Conflict since 1945,” International Political Science Review 14, no. 2 (1993)CrossRefGoogle Scholar; Rule, James B., Theories of Civil Violence (Berkeley: University of California Press, 1988), 178Google Scholar; Lichbach, Mark, The Rebels' Dilemma (Ann Arbor: University of Michigan Press, 1995), 4—6CrossRefGoogle Scholar.
31 Lichbach (fn. 29), 158-65.
32 Collier, Paul, “Economic Causes of Civil Conflict and Their Implications for Policy,” in Crocker, Chester A., Hampson, Fen Osier, and Aall, Pamela, eds., Managing Global Chaos (Washington, D.C.: U.S. Institute of Peace, 2000), 6Google Scholar.
33 James Fearon and David Laitin, “Weak States, Rough Terrain, and Large Scale Ethnic Violence since 1945” (Paper presented at the annual meeting of the American Political Science Association, Atlanta, 1999).
34 Przeworski, Adam, Alvarez, Michael, Cheibub, J. A., and Limongi, F., “What Makes Democracies Endure,” Journal of Democracy 7 (January 1996)Google Scholar.
36 To understand the curse of dimensionality in this context, consider a regression with one continuous dependent variable and a single ten-category explanatory variable. To estimate this regression without assumptions, we need to estimate ten quantities, the mean of Y within each of the ten categories of X (e.g., the mean starting salary for people with each often levels of education). We could easily do this if we had, say, a sample of one hundred observations within each of the ten categories. By contrast, linear regression would summarize these ten numbers with only two, a slope and a constant term, by making the assumption that nothing is being lost. Now suppose we added one more ten-category explanatory variable. The curse of dimensionality is that we need to multiply not add—to estimate one hundred quantities, not merely twenty (graphically, we move from a bar chart to a checkerboard where the height of each square represents dollars of starting salary). An analysis with, say, ten ten-category explanatory variables requires the estimation of ten billion quantities, and summarizing that with a linear regression that has only eleven parameters and maybe even a few (linear) interaction terms is a stunningly strong assumption.
37 Beck, King, and Zeng (fn. 35).
38 Bishop, Christopher M., NeuralNetworksfor Pattern Recognition (Oxford: Oxford University Press, 1995), 366Google Scholar.
39 All members of the committee that constituted our model were based on the same input variables and three numbers: a random number seed for the starting values (which we include here to make it easier to replicate our results), the number of hidden neurons, and the prior standard deviation for the weights. The triples for the members of our committee are 45,3,1; 8,3,2; 908,3,3; 85,3,5; 908,4,2; 35,5,1; 12345,5,5; 768,5,6; 134,5,10; 8,7,3; 9,8,5; 45,8,6; 923,10,1. In general these are all fairly smooth neural network models. We chose this set based on our experience in fitting analyses to similar data and through some preliminary analyses. We expect models that can forecast even better could be developed.
41 Each of the methodological improvements we made to the task force model improved results over the same model without that feature, and all were necessary to generate a model that dominated the (prior-corrected) task force model for any value of C. Of course, prior correction alone was sufficient to improve a great deal on the original task force analysis. A rough ranking from most to least important in changing the results is prior correction, neural networks, committee methods, the additional covari-ates, and multiple imputation for missing data.
42 We summarize the results of these tests here, rather than presenting detailed accompanying figures, since this would involve including numerous figures for each one presented in this paper.
43 Przeworski, Adam and Teune, Henry, The Logic of Comparative Social Inquiry (Malabar, Fla.: Krieger, 1982)Google Scholar.
44 Huntington, Samuel P., Political Order in Changing Societies (New Haven: Yale University Press, 1968)Google Scholar.
46 See King, Keohane, and Verba (fn. 11).
47 Hibbs, Douglas A. Jr., Mass Political Violence:A Cross-National CausalAnalysis (New York: Wiley, 1973)Google Scholar; Gurr, Ted Robert, “Persistence and Change in Political Systems, 1800-1971,” American Political Science Review 68 (December 1974)CrossRefGoogle Scholar; Muller, Edward N. and Weede, Erich, “Cross-National Variation in Political Violence: A Rational Action Approach,” Journal of Conflict Resolution 34 (December 1990)CrossRefGoogle Scholar.
48 See, for example, Huntington (fn. 44); Gurr, Ted Robert, “Why Minorities Rebel: A Global Analysis of Communal Mobilization and Conflict since 1945,” International Political Science Review 14, no. 2 (1993)CrossRefGoogle Scholar; Rummell, R. J., “Democracy, Power, Genocide, and Mass Murder,” Journal of Conflict Resolution 39 (March 1995)Google Scholar.
50 For example, Chen, Lincoln C., “Human Security: Concepts and Approaches,” in Matsumae, T. and Chen, Lincoln C., eds., Common Security in Asia: New Concepts ofHuman Security (Tokyo: Tokyo University Press, 1995)Google Scholar; Gary King and Christopher Murray, “Rethinking Human Security,” Political Science Quarterly (forthcoming), preprint available at http://gking.harvard.edu.
51 For example, infant mortality in Cuba, where the government is so involved that the minister of health chairs a separate meeting to investigate the case of each infant who dies in the country, is lower even than the U.S. Some exceptions would include cases where natural disasters overwhelm the resources of even the most conscientious governments. Other exceptions include states where targeted interventions, such as by the World Health Organization, reduce infant mortality without as much help from the government involved (Ghana may be an example of this).
53 For a contrary view, see Collier (fn. 32).
54 It is also worth noting that this result unfortunately blurs the distinction between variable-based explanations, which we prefer, and country-based “proper noun” stories. More generally, this result also suggests, as is the case, that logistic regression results are more sensitive to outliers than are neural networks. Whereas an outlier can throw off the entire logistic curve, neural networks, by contrast, will usually map outliers separately and localize their effects to small pieces of the functional form.
55 See Melson, Robert, Revolution and Genocide: The Origins ofthe Armenian Genocide and the Holocaust (Chicago: University of Chicago Press, 1992)Google Scholar; Krain, Matt, “State Sponsored Mass Murder: The Onset and Severity of Genocides and Politicides,” Journal ofConflict Resolution 41 (June 1997)Google Scholar; Valentino, Benjamin, “Final Solutions: The Causes of Genocide and Mass Killing,” Security Studies 9 (Spring 2000)CrossRefGoogle Scholar.
56 Estyetal. (fn. 1, 1998).
57 King, Gary, Unifying Political Methodology (Ann Arbor: University of Michigan Press, 1989)Google Scholar.
58 Beck, King, and Zeng (fn. 35).
59 Bishop (fn. 38).
Full text views reflects PDF downloads, PDFs sent to Google Drive, Dropbox and Kindle and HTML full text views.