Skip to main content Accessibility help

Muddling-Through and Deep Learning for Managing Large-Scale Uncertain Risks

  • Tony Cox (a1)


Managing large-scale, geographically distributed, and long-term risks arising from diverse underlying causes – ranging from poverty to underinvestment in protecting against natural hazards or failures of sociotechnical, economic, and financial systems – poses formidable challenges for any theory of effective social decision-making. Participants may have different and rapidly evolving local information and goals, perceive different opportunities and urgencies for actions, and be differently aware of how their actions affect each other through side effects and externalities. Six decades ago, political economist Charles Lindblom viewed “rational-comprehensive decision-making” as utterly impracticable for such realistically complex situations. Instead, he advocated incremental learning and improvement, or “muddling through,” as both a positive and a normative theory of bureaucratic decision-making when costs and benefits are highly uncertain. But sparse, delayed, uncertain, and incomplete feedback undermines the effectiveness of collective learning while muddling through, even if all participant incentives are aligned; it is no panacea. We consider how recent insights from machine learning – especially, deep multiagent reinforcement learning – formalize aspects of muddling through and suggest principles for improving human organizational decision-making. Deep learning principles adapted for human use can not only help participants in different levels of government or control hierarchies manage some large-scale distributed risks, but also show how rational-comprehensive decision analysis and incremental learning and improvement can be reconciled and synthesized.

  • View HTML
    • Send article to Kindle

      To send this article to your Kindle, first ensure is added to your Approved Personal Document E-mail List under your Personal Document Settings on the Manage Your Content and Devices page of your Amazon account. Then enter the ‘name’ part of your Kindle email address below. Find out more about sending to your Kindle. Find out more about sending to your Kindle.

      Note you can select to send to either the or variations. ‘’ emails are free but can only be sent to your device when it is connected to wi-fi. ‘’ emails can be delivered even when you are not connected to wi-fi, but note that service fees apply.

      Find out more about the Kindle Personal Document Service.

      Muddling-Through and Deep Learning for Managing Large-Scale Uncertain Risks
      Available formats

      Send article to Dropbox

      To send this article to your Dropbox account, please select one or more formats and confirm that you agree to abide by our usage policies. If this is the first time you use this feature, you will be asked to authorise Cambridge Core to connect with your <service> account. Find out more about sending content to Dropbox.

      Muddling-Through and Deep Learning for Managing Large-Scale Uncertain Risks
      Available formats

      Send article to Google Drive

      To send this article to your Google Drive account, please select one or more formats and confirm that you agree to abide by our usage policies. If this is the first time you use this feature, you will be asked to authorise Cambridge Core to connect with your <service> account. Find out more about sending content to Google Drive.

      Muddling-Through and Deep Learning for Managing Large-Scale Uncertain Risks
      Available formats


This is an Open Access article, distributed under the terms of the Creative Commons Attribution licence (, which permits unrestricted re-use, distribution, and reproduction in any medium, provided the original work is properly cited.

Corresponding author


Hide All
Marcin, Andrychowicz, Wolski, Filip, Ray, Alex, Schneider, Jonas, Fong, Rachel, Welinder, Peter, McGrew, Bob, Tobin, Josh, Abbeel, Pieter, and Zaremba, Wojciech. 2018. Hindsight Experience Replay. Available at (accessed May 27, 2019).
Kai, Arulkumaran, Deisenroth, Marc P., Brundage, Miles, and Bharath, Anil A.. 2017. A Brief Survey of Deep Reinforcement Learning IEEE Signal Processing Magazine, Special Issue On Deep Learning for Image Understanding (ArxivExtended Version). Available at (accessed May 27, 2019).
Jane, Barlow, Sembi, Sukhdev, Parsons, Helen, Kim, Sungwook, Petrou, Stavros, Harnett, Paul, and Dawe, Sharon. 2019. “A Randomized Controlled Trial and Economic Evaluation of the Parents Under Pressure Program for Parents in Substance Abuse Treatment.” Drug and Alcohol Dependence, 194: 184194.
Barrett, Scott. 2013. “Climate Treaties and Approaching Catastrophes.” Journal of Environmental Economics and Management, 66: 235250.
Bloembergen, D, Tuyls, K, Hennes, D, and Kaisers, M. 2015. “Evolutionary Dynamics of Multi-Agent Learning: A Survey.” Journal of Artificial Intelligence Research, 53: 659697.
Felix, Bernkamp, Turchetta, Matteo, Schoellig, Angela P., and Krause, Andreas. 2017. “Safe Model-based Reinforcement Learning with Stability Guarantees.” In 31st Conference on Neural Information Processing Systems (NIPS 2017), Long Beach, California. Available at
Bernstein Daniel, S., Zilberstein, Shlomo, and Immerman, Neil. 2000. “The Complexity of Decentralized Control of Markov Decision Processes.” Uncertainty in Artificial Intelligence Proceedings, Stanford, California. Available at
Bowling, M, and Veloso, M. 2001. “Rational and Convergent Learning in Stochastic Games.” In IJCAI’01 Proceedings of the 17th International Joint Conference on Artificial Intelligence, Seattle, WA, USA, August 04-10, 2001, Vol. 2: 1021-1026. San Francisco, CA: Morgan Kaufmann Publishers Inc.
Hanna, Boogaard, van Erp, Annemoon M., Walker, Katherine D., and Shaikh, Rashid. 2017. “Accountability Studies on Air Pollution and Health: The HEI Experience.” Current Environmental Health Reports, 4(4): 514522.
Cai, Yifan, Yang, Simon X., and Xu, Xin. 2013. “A Combined Hierarchical Reinforcement Learning Based Approach for Multi-robot Cooperative Target Searching in Complex Unknown Environments.” In IEEE Symposium onAdaptive Dynamic Programming and Reinforcement Learning (ADPRL), Singapore, IEEE.
Clavira, I, Liu, Simin, Fearing, Ronald S., Abbeel, Pieter, Levine, Sergey, Finn, Chelsea. 2018. Learning to Adapt: Meta-Learning for Model-Based Control. Available at (accessed May 27, 2019).
Clemen Robert, T., and Reilly, Terence. 2014. Making Hard Decisions, with the Decision Tools Suite. 3rd ed. Pacific Grove, CA: Duxbury Press.
Cox, L.A. Jr. 2015. “Overcoming Learning Aversion in Evaluating and Managing Uncertain Risks.” Risk Analysis, 35(10):1892–910.
de Vries, S.L.A, Hoeve, M., Asscher, J.J, and Stams, G.J.J.M. 2018. “The Long-Term Effects of the Youth Crime Prevention Program "New Perspectives" on Delinquency and Recidivism.” International Journal of Offender Therapy and Comparative Criminology, 62(12): 36393661.
Devlin, S., Yliniemi, L., Kudenko, K., and Tumer, K. 2014. Potential-Based Difference Rewards for Multiagent Reinforcement Learning. In Proceedings of the 13th International Conference on Autonomous Agents and Multiagent Systems (AAMAS), edited by Lomuscio, Scerri, Bazzan, Huhns, May 5–9, Paris, France. Available at
Fuji, T., Ito, K., Matsumoto, K., and Yano, K. 2018. Deep Multi-Agent Reinforcement Learning using DNN-Weight Evolution to Optimize Supply Chain Performance. In Proceedings of the 51st Hawaii International Conference on System Sciences, Hawaii.
Fudenberg, D., Levine, D., and Maskin, E.. 1994. “The Folk Theorem with Imperfect Public Information,” Econometrica, 62(5): 9971040.
Fudenberg, D., and Maskin, E.. 1986. “The Folk Theorem in Repeated Games with Discounting or with Incomplete Information,” Econometrica, 54: 533554.
Gabel, T., and Riedmiller, M. 2007. On a Successful Application of Multi-Agent Reinforcement Learning to Operations Research Benchmarks. In Proceedings of the IEEE Symposium on Approximate Dynamic Programming and Reinforcement Learning (ADPRL), Honolulu. Available at
Garcia, J., and Fernandez, F. 2015. “A Comprehensive Survey on Safe Reinforcement Learning.” Journal of Machine Learning Research, 16: 14371480 (accessed May 27, 2019).
Gilboa, I., Samet, D., and Schmeidler, D. 2004. “Utilitarian Aggregation of Beliefs and Tastes.” Journal of Political Economy, 112(4): 932938.
Grondman, I., Busoniu, L., Lopes, G.A.D, and Babuska, R. 2012. “A Survey of Actor-Critic Reinforcement Learning: Standard and Natural Policy Gradients.” IEEE Transactions on Systems, Man And Cybernetics Part C, 42(6): 12911307. Available at
Gupta, J.K., Egorov, M., and Kochenderfer, M. 2017. Cooperative Multi-Agent Control Using Deep Reinforcement Learning. In International Conference on Autonomous Agents and Multi-agent Systems, São Paulo, Brazil. Available at
Heitzig, J., Lessmann, K., and Zou, Y. 2011. “Self-Enforcing Strategies to Deter Free-Riding in the Climate Change Mitigation Game and Other Repeated Public Good Games.” Proceedings of the National Academy of Sciences of the United States of America, 108(38): 1573915744.
Henneman, L.R., Liu, C., Mulholland, J.A., and Russell, A.G. 2017. “Evaluating the Effectiveness of Air Quality Regulations: A Review of Accountability Studies and Frameworks.” Journal of the Air & Waste Management Association, 67(2): 144172.
Hörner, J., and Olszewski, W. 2006. “The Folk Theorem for Games with Private Almost‐Perfect Monitoring.“ Econometrica, 74: 14991544. doi:10.1111/j.1468-0262.2006.00717.x
Howard, R., and Abbas, A. 2016. Foundations of Decision Analysis. New York, NY: Pearson.
Hu, J., and Wellman, M.P. 1998. Multiagent Reinforcement Learning: Theoretical Framework and an Algorithm. In Proceedings of the Fifteenth International Conference on Machine Learning (ICML). pp. 242250.
Hu, J., and Wellman, M.P. 2003. “Nash Q-learning for general-sum stochastic games.” The Journal of Machine Learning Research, 4: 10391069.
Hu, Y., Gao, Y., and An, B. 2015. Multiagent reinforcement learning with unshared value functions. IEEE Transactions on Cybernetics, 45(4): 647662.
Hunt, S., Meng, Q., Hinde, C., and Huang, T. 2014. “A Consensus-Based Grouping Algorithm for Multi-agent Cooperative Task Allocation with Complex Requirements.” Cognitive Computation, 6(3): 338350.
Krishnamurthy, V. 2015. Reinforcement Learning: Stochastic Approximation Algorithms for Markov Decision Processes. Available at:
Keeney, R., and Raiffa, H. 1976. Decisions with Multiple Objectives: Preferences and Value Tradeoffs. Hoboken, NJ: John Wiley & Sons.
Lindblom, CE. 1959. The science of muddling through. Public Administration Review, 19(2): 7988.
Lemke, C, Budka, M, and Gabrys, B. 2015. Metalearning: A Survey of Trends and Technologies. Artificial Intelligence Review, 44(1): 117130.
Luce, D.R., and Raiffa, H. 1957. Games and Decisions. New York: John Wiley & Sons.
Lütjens, B., Everett, M., and How, J.P. 2018. Safe Reinforcement Learning with Model Uncertainty Estimates. Available at
Mannion, P., Duggan, J., and Howley, E. 2017. “Analysing the Effects of Reward Shaping in Multi-Objective Stochastic Games. In Adaptive and Learning Agents workshop, Sao Paulo. Available at
March, J.G. 1991. “Exploration and Exploitation in Organizational Learning.” Organization Science, 2(1): 7187. Special Issue: Organizational Learning: Papers in Honor of (and by) James G. March. Available at
Marschak, J., and Radner, R. 1972. Economic Theory of Teams. New Haven: Yale University Press.
Miceli, T.J. 2017 The Economic Approach to Law. 3rd ed. Stanford University Press.
Mnih, V., Kavukcuoglu, K., Silver, D., Rusu, A. A., Veness, J., Bellemare, M. G., Graves, A., et al. 2015. “Human-Level Control Through Deep Reinforcement Learning.” Nature, 518(7540): 529533.
Molden, D.C., and Hui, C.M. 2011. “Promoting De-Escalation of Commitment: A Regulatory-Focus Perspective on Sunk Costs.” Psychological Science, 22(1): 812.
Monios, J. 2016. “Policy Transfer or Policy Churn? Institutional Isomorphism and Neoliberal Convergence in the Transport Sector.” Environment and Planning A: Economy and Space, 49(2).
Mookherjee, D. 2006. “Decentralization Hierarchies, and Incentives: A Mechanism Design Perspective.” Journal of Economic Literature, 44(2): 367390.
Munos, R. 2014. “From Bandits to Monte-Carlo Tree Search: The Optimistic Principle Applied to Optimization and Planning.” Foundations and Trends in Machine Learning 7(1):1129.
Nisan, N. 2007. “Introduction to Mechanism Design (For Computer Scientists).” In Algorithmic Game Theory, edited by Nisan, N., Roughgarden, T., Tardos, E., and Vazirani, V.. New York, NY: Cambridge University Press.
Omidshafiei, S., Pazis, J., Amato, C., How, J.P., and Vian, J. 2017. Deep Decentralized Multi-Task Multi-Agent Reinforcement Learning Under Partial Observability. Available at (accessed May 27, 2019).
Papadimitriou, C., and Tsitsiklis, J.N. 1985. The complexity of Markov Decision Processes. Available at (accessed May 27, 2019).
Petrik, M., Chow, Y., and Ghavamzadeh, M. 2016. Safe Policy Improvement by Minimizing Robust Baseline Regret. Available at (accessed May 27, 2019).
Pham, H.X., La, H.M., Feil-Seifer, D., and Nefian, A. 2018. Cooperative and Distributed Reinforcement Learning of Drones for Field Coverage. Available at (accessed May 27, 2019).
Potter, M., Meeden, L., and Schultz, A. 2001. Heterogeneity in the Coevolved Behaviors of Mobile Robots: The Emergence of Specialists. In Proceedings of the Seventeenth International Conference on Artificial Intelligence (IJCAI), Seattle.
Qadri, G., Alkilzy, M., Franze, M, Hoffmann, W, and Splieth, C. 2018. “School-Based Oral Health Education Increases Caries Inequalities.” Community Dental Health, 35(3): 153159.
Raiffa, H. 1968. Decision Analysis: Introductory Lectures on Choices Under Uncertainty. Reading, MA: Addison Wesley
Rittel, H.W.J, and Webber, M.W. 1973. “Dilemmas in a General Theory of Planning.” Policy Sciences, 4(2): 155169. Available at
Russo, J.E., and Schoemaker, P.J.H. 1989. Decision Traps: Ten Barriers to Brilliant Decision-Making and How to Overcome Them. New York: Doubleday.
Schultze, T., Pfeiffer, F., and Schulz-Hardt, S. 2012. “Biased Information Processing in the Escalation Paradigm: Information Search and Information Evaluation as Potential Mediators of Escalating Commitment.” Journal of Applied Psychology, 97(1): 1632.
Schulze, S., and Evans, O. 2018. Active Reinforcement Learning with Monte-Carlo Tree Search. Available at (accessed May 27, 2019).
Shalev-Shwartz, S., Shammah, S., and Shashua, A. 2016. Safe, Multi-Agent, Reinforcement Learning for Autonomous Driving. Available at (accessed May 27, 2019).
Shiarlis, K., Messias, J., and Whiteson, S. 2016. Inverse Reinforcement Learning from Failure. In Proceedings of the 2016 International Conference on Autonomous Agents and Multiagent Systems (AAMAS), Richland, SC, 1060–1068.
Silver, D., Huang, A., Maddison, C.J., Guez, A., Sifre, L., van den Driessche, G., Schrittwieser, J., et al. 2016. “Mastering the game of Go with deep neural networks and tree search.” Nature, 529(7587): 484–9.
Silver, D., Hubert, T., Schrittwieser, J., Antonoglou, I., Lai, M., Guez, A., Lanctot, M., et al. 2018. “A General Reinforcement Learning Algorithm that Masters Chess, Shogi, and Go Through Self-Play.” Science, 362(6419): 11401144.
Sutton, R.S., and Barto, A.G. 2018. Reinforcement Learning: An Introduction. 2nd ed. Cambridge, MA: MIT Press.
Sutton, R.S., and Barto, A.G. 1998. Reinforcement Learning: An Introduction. Cambridge, MA: MIT Press.
Tang, H., Hao, J., Lv, T., Chen, Y., Zhang, Z., Jia, H., Ren, C., Zheng, Y., Fan, C., and Wang, L. 2018. Hierarchical Deep Multiagent Reinforcement Learning. Available at (accessed May 27, 2019).
Tetlock, P.E., and Gardner, D. 2015. Superforecasting: The Art and Science of Prediction. New York, NY: Penguin Random 1780 House LLC.
Thomas, P.S., Theocharous, G., and Ghavamzadeh, M. 2015. High Confidence Policy Improvement. In Proceedings of the 32nd International Conference on Machine Learning, Lille, France. Available at
Tollefson, J. 2015. “Can randomized trials eliminate global poverty?Nature, 524(7564): 150153.
Usunier, N., Synnaeve, G., Lin, Z., and Chintala, S. 2016. Episodic Exploration for Deep Deterministic Policies: An Application to StarCraft Micromanagement Tasks. Available at (accessed May 27, 2019).
van Hesselt, H., Guez, A., and Silver, D. 2015. Deep Reinforcement Learning with Double Q-learning. Available at (accessed May 27, 2019).
Villar, S, Bowden, J, and Wason, J. 2015. Multi-armed bandit models for the optimal design of clinical trials: benefits and challenges. Statistical Science, 30(2): 199215.
Vodopivec, T., Samothrakis, S., and Ster, B. 2017. “On Monte Carlo Tree Search and Reinforcement Learning.” Journal of Artificial Intelligence Research, 60: 881936.
Wood, P.J. 2011. “Climate Change and Game Theory.” Annals of the New York Academy of Sciences, 1219: 153–70.
Xu, C., Qin, T., Wang, G., and Liu, T-Y. 2017. Machine Learning Reinforcement Learning for Learning Rate Control. Available at (accessed May 27, 2019).
Zhang, K., Yang, Z., Liu, H., Zhang, T., and Basar, T. 2018. Fully Decentralized Multi-Agent Reinforcement Learning with Networked Agents. In Proceedings of the 35 th International Conference on Machine Learning, Stockholm, Sweden, PMLR 80. Available at
Zhang, C., Abdallah, S., and Lesser, V. 2008. MASPA: Multi-Agent Automated Supervisory Policy Adaptation. In UMass Computer Science Technical Report #08-03. Amherst: Computer Science Department University of Massachusetts. Available at
Zhao, W., Meng, Q., and Chung, P.W. 2016. “A Heuristic Distributed Task Allocation Method for Multivehicle Multitask Problems and Its Application to Search and Rescue Scenario.” IEEE Transactions on Cybernetics, 46(4): 902915.


Muddling-Through and Deep Learning for Managing Large-Scale Uncertain Risks

  • Tony Cox (a1)


Full text views

Total number of HTML views: 0
Total number of PDF views: 0 *
Loading metrics...

Abstract views

Total abstract views: 0 *
Loading metrics...

* Views captured on Cambridge Core between <date>. This data will be updated every 24 hours.

Usage data cannot currently be displayed