## References

Marcin, Andrychowicz, Wolski, Filip, Ray, Alex, Schneider, Jonas, Fong, Rachel, Welinder, Peter, McGrew, Bob, Tobin, Josh, Abbeel, Pieter, and Zaremba, Wojciech. 2018. Hindsight Experience Replay. Available at https://arxiv.org/pdf/1707.01495.pdf (accessed May 27, 2019).
Kai, Arulkumaran, Deisenroth, Marc P., Brundage, Miles, and Bharath, Anil A.. 2017. A Brief Survey of Deep Reinforcement Learning IEEE Signal Processing Magazine, Special Issue On Deep Learning for Image Understanding (ArxivExtended Version). Available at https://arxiv.org/pdf/1708.05866.pdf (accessed May 27, 2019).
Jane, Barlow, Sembi, Sukhdev, Parsons, Helen, Kim, Sungwook, Petrou, Stavros, Harnett, Paul, and Dawe, Sharon. 2019. “A Randomized Controlled Trial and Economic Evaluation of the Parents Under Pressure Program for Parents in Substance Abuse Treatment.” Drug and Alcohol Dependence, 194: 184–194. https://doi.org/10.1016/j.drugalcdep.2018.08.044.
Bloembergen, D, Tuyls, K, Hennes, D, and Kaisers, M. 2015. “Evolutionary Dynamics of Multi-Agent Learning: A Survey.” Journal of Artificial Intelligence Research, 53: 659–697.

Bernstein Daniel, S., Zilberstein, Shlomo, and Immerman, Neil. 2000. “The Complexity of Decentralized Control of Markov Decision Processes.” Uncertainty in Artificial Intelligence Proceedings, Stanford, California. Available at https://arxiv.org/ftp/arxiv/papers/1301/1301.3836.pdf
Bowling, M, and Veloso, M. 2001. “Rational and Convergent Learning in Stochastic Games.” In IJCAI’01 Proceedings of the 17th International Joint Conference on Artificial Intelligence, Seattle, WA, USA, August 04-10, 2001, Vol. 2: 1021-1026. San Francisco, CA: Morgan Kaufmann Publishers Inc.

Hanna, Boogaard, van Erp, Annemoon M., Walker, Katherine D., and Shaikh, Rashid. 2017. “Accountability Studies on Air Pollution and Health: The HEI Experience.” Current Environmental Health Reports, 4(4): 514–522. https://doi.org/10.1007/s40572-017-0161-0.
Cai, Yifan, Yang, Simon X., and Xu, Xin. 2013. “A Combined Hierarchical Reinforcement Learning Based Approach for Multi-robot Cooperative Target Searching in Complex Unknown Environments.” In IEEE Symposium onAdaptive Dynamic Programming and Reinforcement Learning (ADPRL), Singapore, IEEE.

Clavira, I, Liu, Simin, Fearing, Ronald S., Abbeel, Pieter, Levine, Sergey, Finn, Chelsea. 2018. Learning to Adapt: Meta-Learning for Model-Based Control. Available at https://arxiv.org/abs/1803.11347 (accessed May 27, 2019).
Clemen Robert, T., and Reilly, Terence. 2014. Making Hard Decisions, with the Decision Tools Suite. 3rd ed. Pacific Grove, CA: Duxbury Press.

de Vries, S.L.A, Hoeve, M., Asscher, J.J, and Stams, G.J.J.M. 2018. “The Long-Term Effects of the Youth Crime Prevention Program "New Perspectives" on Delinquency and Recidivism.” International Journal of Offender Therapy and Comparative Criminology, 62(12): 3639–3661. https://doi.org/10.1177/0306624X17751161.
Devlin, S., Yliniemi, L., Kudenko, K., and Tumer, K. 2014. Potential-Based Difference Rewards for Multiagent Reinforcement Learning. In Proceedings of the 13th International Conference on Autonomous Agents and Multiagent Systems (AAMAS), edited by Lomuscio, Scerri, Bazzan, Huhns, May 5–9, Paris, France. Available at http://web.engr.oregonstate.edu/~ktumer/publications/files/tumer-devlin_aamas14.pdf.
Fuji, T., Ito, K., Matsumoto, K., and Yano, K. 2018. Deep Multi-Agent Reinforcement Learning using DNN-Weight Evolution to Optimize Supply Chain Performance. In Proceedings of the 51st Hawaii International Conference on System Sciences, Hawaii.

Fudenberg, D., Levine, D., and Maskin, E.. 1994. “The Folk Theorem with Imperfect Public Information,” Econometrica, 62(5): 997—1040.

Fudenberg, D., and Maskin, E.. 1986. “The Folk Theorem in Repeated Games with Discounting or with Incomplete Information,” Econometrica, 54: 533—554.

Gilboa, I., Samet, D., and Schmeidler, D. 2004. “Utilitarian Aggregation of Beliefs and Tastes.” Journal of Political Economy, 112(4): 932–938. https://doi.org/10.1086/421173
Grondman, I., Busoniu, L., Lopes, G.A.D, and Babuska, R. 2012. “A Survey of Actor-Critic Reinforcement Learning: Standard and Natural Policy Gradients.” IEEE Transactions on Systems, Man And Cybernetics Part C, 42(6): 1291–1307. Available at http://busoniu.net/files/papers/ivo_smcc12_survey.pdf
Gupta, J.K., Egorov, M., and Kochenderfer, M. 2017. Cooperative Multi-Agent Control Using Deep Reinforcement Learning. In International Conference on Autonomous Agents and Multi-agent Systems, São Paulo, Brazil. Available at http://ala2017.it.nuigalway.ie/papers/ALA2017_Gupta.pdf
Heitzig, J., Lessmann, K., and Zou, Y. 2011. “Self-Enforcing Strategies to Deter Free-Riding in the Climate Change Mitigation Game and Other Repeated Public Good Games.” Proceedings of the National Academy of Sciences of the United States of America, 108(38): 15739–15744. https://doi.org/10.1073/pnas.1106265108.
Henneman, L.R., Liu, C., Mulholland, J.A., and Russell, A.G. 2017. “Evaluating the Effectiveness of Air Quality Regulations: A Review of Accountability Studies and Frameworks.” Journal of the Air & Waste Management Association, 67(2): 144–172. https://doi.org/10.1080/10962247.2016.1242518.
Hörner, J., and Olszewski, W. 2006. “The Folk Theorem for Games with Private Almost‐Perfect Monitoring.“ Econometrica, 74: 1499–1544. doi:10.1111/j.1468-0262.2006.00717.x

Howard, R., and Abbas, A. 2016. Foundations of Decision Analysis. New York, NY: Pearson.

Hu, J., and Wellman, M.P. 1998. Multiagent Reinforcement Learning: Theoretical Framework and an Algorithm. In Proceedings of the Fifteenth International Conference on Machine Learning (ICML). pp. 242–250.

Hu, J., and Wellman, M.P. 2003. “Nash Q-learning for general-sum stochastic games.” The Journal of Machine Learning Research, 4: 1039–1069.

Hu, Y., Gao, Y., and An, B. 2015. Multiagent reinforcement learning with unshared value functions. IEEE Transactions on Cybernetics, 45(4): 647–662.

Hunt, S., Meng, Q., Hinde, C., and Huang, T. 2014. “A Consensus-Based Grouping Algorithm for Multi-agent Cooperative Task Allocation with Complex Requirements.” Cognitive Computation, 6(3): 338–350.

Keeney, R., and Raiffa, H. 1976. Decisions with Multiple Objectives: Preferences and Value Tradeoffs. Hoboken, NJ: John Wiley & Sons.

Luce, D.R., and Raiffa, H. 1957. Games and Decisions. New York: John Wiley & Sons.

March, J.G. 1991. “Exploration and Exploitation in Organizational Learning.” Organization Science, 2(1): 71–87. Special Issue: Organizational Learning: Papers in Honor of (and by) James G. March. Available at http://www.jstor.org/stable/2634940.
Marschak, J., and Radner, R. 1972. Economic Theory of Teams. New Haven: Yale University Press.

Miceli, T.J. 2017 The Economic Approach to Law. 3rd ed. Stanford University Press.

Mnih, V., Kavukcuoglu, K., Silver, D., Rusu, A. A., Veness, J., Bellemare, M. G., Graves, A., et al. 2015. “Human-Level Control Through Deep Reinforcement Learning.” Nature, 518(7540): 529–533.

Monios, J. 2016. “Policy Transfer or Policy Churn? Institutional Isomorphism and Neoliberal Convergence in the Transport Sector.” Environment and Planning A: Economy and Space, 49(2). https://doi.org/10.1177/0308518X16673367
Mookherjee, D. 2006. “Decentralization Hierarchies, and Incentives: A Mechanism Design Perspective.” Journal of Economic Literature, 44(2): 367–390.

Munos, R. 2014. “From Bandits to Monte-Carlo Tree Search: The Optimistic Principle Applied to Optimization and Planning.” Foundations and Trends in Machine Learning 7(1):1–129. https://doi.org/10.1561/2200000038
Nisan, N. 2007. “Introduction to Mechanism Design (For Computer Scientists).” In Algorithmic Game Theory, edited by Nisan, N., Roughgarden, T., Tardos, E., and Vazirani, V.. New York, NY: Cambridge University Press.

Omidshafiei, S., Pazis, J., Amato, C., How, J.P., and Vian, J. 2017. Deep Decentralized Multi-Task Multi-Agent Reinforcement Learning Under Partial Observability. Available at https://arxiv.org/abs/1703.06182 (accessed May 27, 2019).
Petrik, M., Chow, Y., and Ghavamzadeh, M. 2016. Safe Policy Improvement by Minimizing Robust Baseline Regret. Available at https://arxiv.org/abs/1607.03842 (accessed May 27, 2019).
Pham, H.X., La, H.M., Feil-Seifer, D., and Nefian, A. 2018. Cooperative and Distributed Reinforcement Learning of Drones for Field Coverage. Available at https://arxiv.org/pdf/1803.07250.pdf (accessed May 27, 2019).
Potter, M., Meeden, L., and Schultz, A. 2001. Heterogeneity in the Coevolved Behaviors of Mobile Robots: The Emergence of Specialists. In Proceedings of the Seventeenth International Conference on Artificial Intelligence (IJCAI), Seattle.

Qadri, G., Alkilzy, M., Franze, M, Hoffmann, W, and Splieth, C. 2018. “School-Based Oral Health Education Increases Caries Inequalities.” Community Dental Health, 35(3): 153–159. https://doi.org/10.1922/CDH_4145Qadri07.
Raiffa, H. 1968. Decision Analysis: Introductory Lectures on Choices Under Uncertainty. Reading, MA: Addison Wesley

Russo, J.E., and Schoemaker, P.J.H. 1989. Decision Traps: Ten Barriers to Brilliant Decision-Making and How to Overcome Them. New York: Doubleday.

Schultze, T., Pfeiffer, F., and Schulz-Hardt, S. 2012. “Biased Information Processing in the Escalation Paradigm: Information Search and Information Evaluation as Potential Mediators of Escalating Commitment.” Journal of Applied Psychology, 97(1): 16–32. https://doi.org/10.1037/a0024739.
Schulze, S., and Evans, O. 2018. Active Reinforcement Learning with Monte-Carlo Tree Search. Available at https://arxiv.org/abs/1803.04926 (accessed May 27, 2019).
Shalev-Shwartz, S., Shammah, S., and Shashua, A. 2016. Safe, Multi-Agent, Reinforcement Learning for Autonomous Driving. Available at https://arxiv.org/pdf/1610.03295.pdf (accessed May 27, 2019).
Silver, D., Huang, A., Maddison, C.J., Guez, A., Sifre, L., van den Driessche, G., Schrittwieser, J., et al. 2016. “Mastering the game of Go with deep neural networks and tree search.” Nature, 529(7587): 484–9. https://doi.org/10.1038/nature16961.
Silver, D., Hubert, T., Schrittwieser, J., Antonoglou, I., Lai, M., Guez, A., Lanctot, M., et al. 2018. “A General Reinforcement Learning Algorithm that Masters Chess, Shogi, and Go Through Self-Play.” Science, 362(6419): 1140–1144. https://doi.org/10.1126/science.aar6404.
Sutton, R.S., and Barto, A.G. 2018. Reinforcement Learning: An Introduction. 2nd ed. Cambridge, MA: MIT Press.

Sutton, R.S., and Barto, A.G. 1998. Reinforcement Learning: An Introduction. Cambridge, MA: MIT Press.

Tang, H., Hao, J., Lv, T., Chen, Y., Zhang, Z., Jia, H., Ren, C., Zheng, Y., Fan, C., and Wang, L. 2018. Hierarchical Deep Multiagent Reinforcement Learning. Available at https://arxiv.org/pdf/1809.09332.pdf (accessed May 27, 2019).
Tetlock, P.E., and Gardner, D. 2015. Superforecasting: The Art and Science of Prediction. New York, NY: Penguin Random 1780 House LLC.

Usunier, N., Synnaeve, G., Lin, Z., and Chintala, S. 2016. Episodic Exploration for Deep Deterministic Policies: An Application to StarCraft Micromanagement Tasks. Available at https://arxiv.org/pdf/1609.02993.pdf (accessed May 27, 2019).
Villar, S, Bowden, J, and Wason, J. 2015. Multi-armed bandit models for the optimal design of clinical trials: benefits and challenges. Statistical Science, 30(2): 199–215. https://doi.org/10.1214/14-STS504.
Vodopivec, T., Samothrakis, S., and Ster, B. 2017. “On Monte Carlo Tree Search and Reinforcement Learning.” Journal of Artificial Intelligence Research, 60: 881–936. https://doi.org/10.1613/jair.5507
Xu, C., Qin, T., Wang, G., and Liu, T-Y. 2017. Machine Learning Reinforcement Learning for Learning Rate Control. Available at https://arxiv.org/abs/1705.11159 (accessed May 27, 2019).
Zhang, K., Yang, Z., Liu, H., Zhang, T., and Basar, T. 2018. Fully Decentralized Multi-Agent Reinforcement Learning with Networked Agents. In Proceedings of the 35 th International Conference on Machine Learning, Stockholm, Sweden, PMLR 80. Available at http://proceedings.mlr.press/v80/zhang18n/zhang18n.pdf.
Zhao, W., Meng, Q., and Chung, P.W. 2016. “A Heuristic Distributed Task Allocation Method for Multivehicle Multitask Problems and Its Application to Search and Rescue Scenario.” IEEE Transactions on Cybernetics, 46(4): 902–915. https://doi.org/10.1109/TCYB.2015.2418052.