Skip to main content Accessibility help

What Can We Learn from Predictive Modeling?

  • Skyler J. Cranmer (a1) and Bruce A. Desmarais (a2)


The large majority of inferences drawn in empirical political research follow from model-based associations (e.g., regression). Here, we articulate the benefits of predictive modeling as a complement to this approach. Predictive models aim to specify a probabilistic model that provides a good fit to testing data that were not used to estimate the model’s parameters. Our goals are threefold. First, we review the central benefits of this under-utilized approach from a perspective uncommon in the existing literature: we focus on how predictive modeling can be used to complement and augment standard associational analyses. Second, we advance the state of the literature by laying out a simple set of benchmark predictive criteria. Third, we illustrate our approach through a detailed application to the prediction of interstate conflict.


Corresponding author


Hide All

Authors’ note: Many thanks to Alison Craig for research assistance. Sincere thanks also to Matt Blackwell and Michael Neblo for helpful comments on an earlier draft. The authors are grateful for the support of the National Science Foundation (SES-1558661, SES-1619644, SES-1637089, CISE-1320219, SES-1357622, SES-1514750, and SES-1461493) and the Alexander von Humboldt Foundation. Replication data are posted to the Political Analysis Dataverse (Cranmer and Desmarais 2016a).

Contributing Editor: Jonathan Katz



Hide All
Achen, Christopher H. 2002. Toward a new political methodology: Microfoundations and ART. Annual Review of Political Science 5(1):423450.
Adamic, Lada A., and Adar, Eytan. 2003. Friends and neighbors on the web. Social Networks 25(3):211230.
Airoldi, Edoardo M., Blei, David M., Fienberg, Stephen E., and Xing, Eric P.. 2008. Mixed membership stochastic blockmodels. Journal of Machine Learning Research 9:19812014.
Attewell, Paul, Monaghan, David B., and Kwong, Darren. 2015. Preparing training and test datasets . 1 edn. University of California Press, pp. 6371.
Beck, Nathaniel, Katz, Jonathan N., and Tucker, Richard. 1998. Taking time seriously: Time-series-cross-section analysis with a binary dependent variable. American Journal of Political Science 42(4):12601288.
Beck, Nathaniel, King, Gary, and Zeng, Langche. 2000. Improving quantitative studies of international conflict: A conjecture. American Political Science Review 94(1):2135.
Brandt, Patrick T., Freeman, John R., and Schrodt, Philip A.. 2011. Real time, time series forecasting of political conflict. Conflict Management and Peace Science 28(1):4164.
Cawley, Gavin C., and Talbot, Nicola L. C.. 2010. On over-fitting in model selection and subsequent selection bias in performance evaluation. The Journal of Machine Learning Research 11:20792107.
Clarke, Kevin A., and Primo, David M.. 2007. Modernizing political science: A model-based approach. Perspectives on Politics 5(4):741753.
Collopy, Fred, Adya, Monica, and Armstrong, J. Scott. 1994. Principles for examining predictive validity – The case of information systems spending forecasts. Information Systems Research 5(2):170179.
Cranmer, Skyler J., and Desmarais, Bruce A.. 2011. Inferential network analysis with exponential random graph models. Political Analysis 19(1):6686.
Cranmer, Skyler, and Desmarais, Bruce. 2016a. Replication data for: What can we learn from predictive modeling?, Harvard Dataverse.
Cranmer, Skyler J., and Desmarais, Bruce A.. 2016b. A critique of dyadic design. International Studies Quarterly 60(2):355362.
Cranmer, Skyler J., Desmarais, Bruce A., and Kirkland, Justin H.. 2012. Towards a network theory of alliance formation. International Interactions 38(3):295324.
Cranmer, Skyler J., Desmarais, Bruce A., and Menninga, Elizabeth J.. 2012. Complex dependencies in the alliance network. Conflict Management and Peace Science 29(3):279313.
Cranmer, Skyler J., Menninga, Elizabeth J., and Mucha, Peter J.. 2015. Kantian fractionalization predicts the conflict propensity of the international system. Proceedings of the National Academy of Sciences 112(38):1181211816.
Cranmer, Skyler J., Rice, Douglas, and Siverson, Randolph M.. 2015. What to do about atheoretic lags. Political Science Research Methods, doi:10.1017/psrm.2015.36.
Cybenko, George. 1989. Approximation by superpositions of a sigmoidal function. Mathematics of Control, Signals and Systems 2(4):303314.
Dafoe, Allan. 2011. Statistical critiques of the democratic peace: Caveat emptor. American Journal of Political Science 55(2):247262.
Davis, Jesse, and Goadrich, Mark. 2006. The relationship between precision-recall and ROC curves. In Proceedings of the 23rd International Conference on Machine Learning, ICML ’06 . New York, NY: ACM, pp. 233240.
Desmarais, Bruce A., and Cranmer, Skyler J.. 2010. Consistent confidence intervals for maximum pseudolikelihood estimators. In Proceedings of the Neural Information Processing Systems 2010 Workshop on Computational Social Science and the Wisdom of Crowds .
Desmarais, Bruce A., and Cranmer, Skyler J.. 2011. Forecasting the locational dynamics of transnational terrorism: A network analytic approach. In Proceedings of the European Intelligence and Security Informatics Conference (EISIC) 2011 , Athens, Greece: IEEE Computer Society.
Desmarais, Bruce A., and Cranmer, Skyler J.. 2012. Statistical mechanics of networks: Estimation and uncertainty. Physica A 391(4):18651876.
Dettling, Marcel, and Bühlmann, Peter. 2003. Boosting for tumor classification with gene expression data. Bioinformatics 19(9):10611069.
Droge, Bernd. 1999. Asymptotic optimality of full cross-validation for selecting linear regression models. Statistics and Probability Letters 44(4):351357.
Druckman, James N., Green, Donald P., Kuklinski, James H., and Lupia, Arthur. 2006. The growth and development of experimental research in political science. American Political Science Review 100(04):627635.
Esteban, Cristóbal, Schmidt, Danilo, Krompaß, Denis, and Tresp, Volker. 2015. Predicting sequences of clinical events by using a personalized temporal latent embedding model. In Healthcare Informatics (ICHI), 2015 International Conference on IEEE , pp. 130139.
Faraway, Julian James. 2006. Extending the linear model with R: Generalized linear, mixed effects and nonparametric regression models, vol. 66 . New York: CRC press.
Fawcett, Tom. 2006. An introduction to ROC analysis. Pattern Recognition Letters 27(8):861874.
Gartzke, Erik. 2007. The capitalist peace. American Journal of Political Science 51(1):166191.
Gill, Jeff. 2014. Bayesian methods: A social and behavioral sciences approach . Boca Raton: Chapman and Hall/CRC.
Gleditsch, Kristian Skrede. 2002. Expanded trade and GDP data. Journal of Conflict Resolution 46(5):712724.
Gleditsch, Kristian S., and Ward, Michael D.. 2001. Measuring space: A minimum-distance database and applications to international studies. Journal of Peace Research 38(6):739758.
Goeman, J. J., Meijer, R. J., and Chaturvedi, N.. 2016. Penalized: L1 (lasso and fused lasso) and L2 (ridge) penalized estimation in GLMs and in the Cox model. R package version 0.9-46.
Goldstone, Jack A., Bates, Robert H., Epstein, David L., Gurr, Ted Robert, Lustik, Michael B., Marshall, Monty G., Ulfelder, Jay, and Woodward, Mark. 2010. A global model for forecasting political instability. American Journal of Political Science 54(1):190208.
Gurbaxani, Vijay, and Mendelson, Haim. 1990. An integrative model of information systems spending growth. Information Systems Research 1(1):2346.
Gurbaxani, Vijay, and Mendelson, Haim. 1994. Modeling vs. forecasting—The case of information systems spending. Information Systems Research 5(2):180190.
Hall, Peter. 1983. Large sample optimality of least squares cross-validation in density estimation. The Annals of Statistics 11(4):11561174.
Hanneke, Steve, Fu, Wenjie, and Xing, Eric P.. 2010. Discrete temporal models of social networks. The Electronic Journal of Statistics 4:585605.
Hastie, Trevor, Tibshirani, Robert, and Friedman, Jerome. 2009. The elements of statistical learning: Data mining, inference, and prediction . 2nd edn. New York: Springer.
Hoff, Peter D., Raftery, Adrian E., and Handcock, Mark S.. 2002. Latent space approaches to social network analysis. Journal of the American Statistical association 97(460):10901098.
Hua, Wang, Cuiqin, Ma, and Lijuan, Zhou. 2009. A brief review of machine learning and its application. In Information engineering and computer science, 2009. ICIECS 2009. International Conference on IEEE , pp. 14.
Jensen, David D., and Cohen, Paul R.. 2000. Multiple comparisons in induction algorithms. Machine Learning 38(3):309338.
Keele, Luke. 2015. The statistics of causal inference: A view from political methodology. Political Analysis 23:313335.
Kuhn, Max, and Johnson, Kjell. 2013. Applied predictive modeling . New York: Springer.
van der Laan, Mark J., Dudoit, Sandrine, and Keles, Sunduz. 2004. Asymptotic optimality of likelihood-based cross-validation. Statistical Applications in Genetics and Molecular Biology 3(1):123.
Leicht, Elizabeth A., Holme, Petter, and Newman, Mark E. J.. 2006. Vertex similarity in networks. Physical Review E 73(2):026120.
Lopes, Miguel, and Bontempi, Gianluca. 2014. On the null distribution of the precision and recall curve. In Joint European Conference on Machine Learning and Knowledge Discovery in Databases Berlin: Springer, pp. 322337.
Marshall, Monty G., and Jaggers, Keith. 2002. Polity IV project: Political regime characteristics and transitions, pp. 1800–2002.
Meyer, Patrick E., Lafitte, Frederic, and Bontempi, Gianluca. 2008. MINET: An open source R/Bioconductor package for mutual information based network inference. BMC Bioinformatics 9.
Muchlinski, David, Siroky, David, He, Jingrui, and Kocher, Matthew. 2016. Comparing random forest with logistic regression for predicting class-imbalanced civil war onset data. Political Analysis 24(1):87103.
Nadeau, Claude, and Bengio, Yoshua. 2003. Inference for the generalization error. Machine Learning 52(3):239281.
Nowak, Robert D. 1997. Optimal signal estimation using cross-validation. IEEE Signal Processing Letters 4(1):2325.
Oneal, John, and Russett, Bruce M.. 1999. The Kantian peace: The Pacific benefits of democracy, interdependence, and international organization. World Politics 52(1):137.
Oneal, John R., and Russett, Bruce. 2005. Rule of three, let it be? When more really is better. Conflict Management and Peace Science 22(4):293310.
Ozenne, Brice, Subtil, Fabien, and Maucort-Boulch, Delphine. 2015. The precision–recall curve overcame the optimism of the receiver operating characteristic curve in rare diseases. Journal of Clinical Epidemiology 68(8):855859.
Pevenhouse, Jon C., and Goldstein, Joshua S.. 1999. Serbian compliance or defiance in Kosovo? Statistical analysis and real-time predictions. Journal of Conflict Resolution 43(4):538546.
Pevehouse, Jon, Nordstrom, Timothy, and Warnke, Kevin. 2004. The correlates of war 2 international governmental organizations data version 2.0. Conflict Management and Peace Science 21(2):101119.
Pons, Pascal, and Latapy, Matthieu. 2005. Computing communities in large networks using random walks. Journal of Graph Algorithms and Applications 10(2):191218.
Rakotomalala, Ricco, Chauchat, Jean-Hughes, and Pellegrino, Francois. 2006. Accuracy estimation with clustered dataset. In Proceedings of the Fifth Australasian Conference on Data Mining and Analystics 61:17–22 . Sydney, Australia: Australian Computer Society, Inc.
Ripley, Brian, and Venables, William. 2016. Nnet: Feed-forward neural networks and multinomial log-linear models ., R package version 7.3-12.
Rost, Nicholas, Schneider, Gerald, and Kleibl, Johannes. 2009. A global risk assessment model for civil wars. Social Science Research 38(4):921933.
Schneider, Gerald. 2012. Banking on the Broker: Forecasting conflict in the levant with financial data. In Illuminating the shadow of the future: scientific prediction and the human condition , ed. Wayman, Frank, Williamson, Paul, and Bueno de Mesquita, Bruce. Ann Arbor: University of Michigan Press.
Schneider, Gerald, Gleditsch, Nils Petter, and Carey, Sabine. 2010. Exploring the past, anticipating the future: A symposium. International Studies Review 12(1):17.
Schneider, Gerald, Gleditsch, Nils Petter, and Carey, Sabine. 2011. Forecasting in international relations: One quest, three approaches. Conflict Management and Peace Science 28(5):514.
Schrodt, Philip A., and Gerner, Deborah J.. 2000. Using cluster analysis to derive early warning indicators for political change in the middle east, 1979–1996. American Political Science Review 94(4):803818.
Shmueli, Galit. 2010. To explain or to predict?. Statistical Science 25(3):289310.
Sing, T., Sander, O., Beerenwinkel, N., and Lengauer, T.. 2005. ROCR: Visualizing classifier performance in R. Bioinformatics 21(20):7881.
Singer, J. David, Bremer, Stuart, and Stuckey, John. 1972. Capability distribution, uncertainty, and major power war, 1820–1965. Peace, war, and numbers 19:48.
Stinnett, Douglas M., Tir, Jaroslav, Diehl, Paul F., Schafer, Philip, and Gochman, Charles. 2002. The Correlates of War (COW) project direct contiguity data, version 3.0. Conflict Management and Peace Science 19(2):5967.
Stone, M. 1974. Cross-validatory choice and assessment of statistical predictions. Journal of the Royal Statistical Society, Series B 36(2):111147.
Stone, M. 1977. Asymptotics for and against cross-validation. Biometrika 64(1):2935.
Tuszynski, Jarek. 2014. caTools: Tools: Moving window statistics, GIF, Base64, ROC AUC, etc. R package version 1.17.
Van Maanen, John, Sørensen, Jesper B., and Mitchell, Terence R.. 2007. The interplay between theory and method. Academy of Management Review 32(4):11451154.
Ward, Michael D., Greenhill, Brian D., and Bakke, Kristin M.. 2010. The perils of policy by p-value: Predicting civil conflicts. Journal of Peace Research 47(4):363375.
Ward, Michael D., Siverson, Randolph M., and Cao, Xun. 2007. Disputes, democracies, and dependencies: A reexamination of the Kantian peace. American Journal of Political Science 51(3):583601.
Zou, Hui, and Hastie, Trevor. 2005. Regularization and variable selection via the elastic net. Journal of the Royal Statistical Society: Series B (Statistical Methodology) 67(2):301320.
Recommend this journal

Email your librarian or administrator to recommend adding this journal to your organisation's collection.

Political Analysis
  • ISSN: 1047-1987
  • EISSN: 1476-4989
  • URL: /core/journals/political-analysis
Please enter your name
Please enter a valid email address
Who would you like to send this to? *


Altmetric attention score

Full text views

Total number of HTML views: 0
Total number of PDF views: 0 *
Loading metrics...

Abstract views

Total abstract views: 0 *
Loading metrics...

* Views captured on Cambridge Core between <date>. This data will be updated every 24 hours.

Usage data cannot currently be displayed