Hostname: page-component-848d4c4894-nmvwc Total loading time: 0 Render date: 2024-06-16T05:35:57.019Z Has data issue: false hasContentIssue false

Expanding horizons in reinforcement learning for curious exploration and creative planning

Published online by Cambridge University Press:  21 May 2024

Dale Zhou*
Affiliation:
Neurobiology and Behavior, 519 Biological Sciences Quad, University of California, Irvine, CA, USA dale.zhou@uci.edu https://dalezhou.com Center for the Neurobiology of Learning and Memory, Qureshey, Research Laboratory, University of California, Irvine, CA, USA aaron.bornstein@uci.edu https://aaron.bornstein.org/
Aaron M. Bornstein
Affiliation:
Center for the Neurobiology of Learning and Memory, Qureshey, Research Laboratory, University of California, Irvine, CA, USA aaron.bornstein@uci.edu https://aaron.bornstein.org/ Department of Cognitive Sciences, 2318 Social & Behavioral Sciences Gateway, University of California, Irvine, CA, USA
*
*Corresponding author.

Abstract

Curiosity and creativity are expressions of the trade-off between leveraging that with which we are familiar or seeking out novelty. Through the computational lens of reinforcement learning, we describe how formulating the value of information seeking and generation via their complementary effects on planning horizons formally captures a range of solutions to striking this balance.

Type
Open Peer Commentary
Copyright
Copyright © The Author(s), 2024. Published by Cambridge University Press

Access options

Get access to the full version of this content by using one of the access options below. (Log in options will check for institutional or personal access. Content may require purchase if you do not have access.)

References

Addicott, M. A., Pearson, J. M., Sweitzer, M. M., Barack, D. L., & Platt, M. L. (2017). A primer on foraging and the explore/exploit trade-off for psychiatry research. Neuropsychopharmacology, 42(10), 19311939.CrossRefGoogle ScholarPubMed
Aru, J., Drüke, M., Pikamäe, J., & Larkum, M. E. (2023). Mental navigation and the neural mechanisms of insight. Trends in Neurosciences, 46(2), 100109.CrossRefGoogle ScholarPubMed
Botvinick, M. M. (2012). Hierarchical reinforcement learning and decision making. Current Opinion in Neurobiology, 22(6), 956962.CrossRefGoogle ScholarPubMed
Botvinick, M. M., Niv, Y., & Barto, A. G. (2009). Hierarchically organized behavior and its neural foundations: A reinforcement learning perspective. Cognition, 113(3), 262280.CrossRefGoogle ScholarPubMed
Correa, C. G., Ho, M. K., Callaway, F., Daw, N. D., & Griffiths, T. L. (2023). Humans decompose tasks by trading off utility and computational cost. PLoS Computational Biology, 19(6), e1011087.CrossRefGoogle ScholarPubMed
Cover, T. M., & Thomas, J. A. (1991). Elements of Information Theory (pp. 336373). Wiley.Google Scholar
Dubey, R., & Griffiths, T. L. (2020). Reconciling novelty and complexity through a rational analysis of curiosity. Psychological Review, 127(3), 455476. http://dx.doi.org/10.1037/rev0000175.CrossRefGoogle ScholarPubMed
Eysenbach, B., Gupta, A., Ibarz, J., & Levine, S. (2018). Diversity is all you need: Learning skills without a reward function. arXiv preprint arXiv:1802.06070.Google Scholar
Fox, R., Pakman, A., & Tishby, N. (2015). Taming the noise in reinforcement learning via soft updates. arXiv preprint arXiv:1512.08562.Google Scholar
Gershman, S. J., & Niv, Y. (2015). Novelty and inductive generalization in human reinforcement learning. Topics in Cognitive Science, 7(3), 391415.CrossRefGoogle ScholarPubMed
Gottlieb, J., Oudeyer, P.-Y., Lopes, M., & Baranes, A. (2013). Information-seeking, curiosity, and attention: Computational and neural mechanisms. Trends in Cognitive Sciences, 17(11), 585593.CrossRefGoogle ScholarPubMed
Gruber, M. J., & Ranganath, C. (2019). How curiosity enhances hippocampus-dependent memory: The prediction, appraisal, curiosity, and exploration (pace) framework. Trends in Cognitive Sciences, 23(12), 10141025.CrossRefGoogle ScholarPubMed
Harada, T. (2020). The effects of risk-taking, exploitation, and exploration on creativity. PLoS ONE, 15(7), e0235698.CrossRefGoogle ScholarPubMed
Harhen, N. C., & Bornstein, A. M. (2023). Overharvesting in human patch foraging reflects rational structure learning and adaptive planning. Proceedings of the National Academy of Sciences, 120(13), e2216524120.CrossRefGoogle ScholarPubMed
Jach, H.K., Cools, R., Frisvold, A., Grubb, M., Hartley, C. A., & Hartman, J. (2023). Curiosity in cognitive science and personality psychology: Individual differences in information demand have a low dimensional structure that is predicted by personality traits. PsyArXiv.Google Scholar
Jiang, N., Kulesza, A., Singh, S., & Lewis, R. (2015). The dependence of effective planning horizon on model accuracy. In E. Elkind, G. Weiss, P. Yolum, & R. H. Bordini (Eds.), Proceedings of the 2015 international conference on autonomous agents and multiagent systems (pp. 1181–1189). International Foundation for Autonomous Agents and Multiagent Systems.Google Scholar
Kaelbling, L. P., Littman, M. L., & Cassandra, A. R. (1998). Planning and acting in partially observable stochastic domains. Artificial Intelligence, 101(1–2), 99134.CrossRefGoogle Scholar
Kashdan, T. B., Stiksma, M. C., Disabato, D. J., McKnight, P. E., Bekier, J., Kaji, J., & Lazarus, R. (2018). The five-dimensional curiosity scale: Capturing the bandwidth of curiosity and identifying four unique subgroups of curious people. Journal of Research in Personality, 73, 130149.CrossRefGoogle Scholar
Kauvar, I., Doyle, C., Zhou, L., & Haber, N. (2023). Curious replay for model-based adaptation.Google Scholar
Kobayashi, K., Ravaioli, S., Baranès, A., Woodford, M., & Gottlieb, J. (2019). Diverse motives for human curiosity. Nature Human Behaviour, 3(6), 587595. http://dx.doi.org/10.1038/s41562-019-0589-3.CrossRefGoogle ScholarPubMed
Kruglanski, A. W., & Webster, D. M. (2018) Motivated closing of the mind: “seizing” and “freezing”. In A.W. Kruglanski (ed), The Motivated Mind (pp. 60103). Routledge.CrossRefGoogle Scholar
Lai, L., & Gershman, S. J. (2021). Policy compression: An information bottleneck in action selection. In Psychology of learning and motivation (Vol. 74, pp. 195232). Elsevier.CrossRefGoogle Scholar
Liquin, E. G., & Gopnik, A. (2022). Children are more exploratory and learn more than adults in an approach-avoid task. Cognition, 218, 104940. http://dx.doi.org/10.1016/j.cognition.2021.104940.CrossRefGoogle Scholar
Litman, J. A. (2008). Interest and deprivation factors of epistemic curiosity. Personality and Individual Differences, 44(7), 15851595.CrossRefGoogle Scholar
Lydon-Staley, D. M., Zhou, D., Blevins, A. S., Zurn, P., & Bassett, D. S. (2021). Hunters, busybodies and the knowledge network building associated with deprivation curiosity. Nature Human Behaviour, 5(3), 327336.CrossRefGoogle ScholarPubMed
Mack, M. L., Preston, A. R., & Love, B. C. (2020). Ventromedial prefrontal cortex compression during concept learning. Nature Communications, 11(1), 46.CrossRefGoogle ScholarPubMed
Masís, J., Chapman, T., Rhee, J. Y., Cox, D. D., & Saxe, A. M. (2023). Strategically managing learning during perceptual decision making. Elife, 12, e64978.CrossRefGoogle ScholarPubMed
Molinaro, G., Cogliati Dezza, I., Bühler, S. K., Moutsiana, C., & Sharot, T. (2023). Multifaceted information-seeking motives in children. Nature Communications, 14(1), 611. http://dx.doi.org/10.1038/s41467-023-40971-x.CrossRefGoogle ScholarPubMed
Momennejad, I. (2020). Learning structures: Predictive representations, replay, and generalization. Current Opinion in Behavioral Sciences, 32, 155166.CrossRefGoogle ScholarPubMed
Nussenbaum, K., Martin, R. E., Maulhardt, S., Yang, Y. J., Bizzell-Hatcher, G., Bhatt, N. S., Koenig, M., Rosenbaum, G. M., O'Doherty, J. P., Cockburn, J., & Hartley, C. A. (2023). Novelty and uncertainty differentially drive exploration across development. eLife, 12, 595. http://dx.doi.org/10.7554/eLife.84260.CrossRefGoogle ScholarPubMed
Oudeyer, P.-Y., & Kaplan, F. (2007). What is intrinsic motivation? A typology of computational approaches. Frontiers in Neurorobotics, 1, 6.CrossRefGoogle ScholarPubMed
Patankar, S. P., Zhou, D., Lynn, C. W., Kim, J. Z., Ouellet, M., Ju, H., … Bassett, D. S. (2023). Curiosity as filling, compressing, and reconfiguring knowledge networks. Collective Intelligence, 2(4), 26339137231207633.CrossRefGoogle Scholar
Rmus, M., Ritz, H., Hunter, L. E., Bornstein, A. M., & Shenhav, A. (2022). Humans can navigate complex graph structures acquired during latent learning. Cognition, 225, 105103.CrossRefGoogle ScholarPubMed
Rubin, J., Shamir, O., & Tishby, N. (2012). Trading value and information in mdps. Decision making with imperfect decision makers (pp. 5774). Springer Berlin.CrossRefGoogle Scholar
Schacter, D. L., & Addis, D. R. (2007). The cognitive neuroscience of constructive memory: Remembering the past and imagining the future. Philosophical Transactions of the Royal Society B: Biological Sciences, 362(1481), 773786.CrossRefGoogle ScholarPubMed
Schapiro, A. C., McDevitt, E. A., Rogers, T. T., Mednick, S. C., & Norman, K. A. (2018). Human hippocampal replay during rest prioritizes weakly learned information and predicts memory performance. Nature Communications, 9(1), 3920.CrossRefGoogle ScholarPubMed
Schapiro, A. C., Rogers, T. T., Cordova, N. I., Turk-Browne, N. B., & Botvinick, M. M. (2013). Neural representations of events arise from temporal community structure. Nature Neuroscience, 16(4), 486492.CrossRefGoogle ScholarPubMed
Schmidhuber, J. (2008). Driven by compression progress: A simple principle explains essential aspects of subjective beauty, novelty, surprise, interestingness, attention, curiosity, creativity, art, science, music, jokes. In Pezzulo, G., Butz, M. V., Sigaud, O., & Baldassarre, Gianluca (Eds.), Workshop on anticipatory behavior in adaptive learning systems (pp. 4876). Springer.Google Scholar
Schulz, L. E., & Bonawitz, E. B. (2007). Serious fun: Preschoolers engage in more exploratory play when evidence is confounded. Developmental Psychology, 43(4), 10451050. http://dx.doi.org/10.1037/0012-1649.43.4.1045.CrossRefGoogle ScholarPubMed
Shannon, C. E. (1948). A mathematical theory of communication. The Bell System Technical Journal, 27(3), 379423.CrossRefGoogle Scholar
Stachenfeld, K. L., Botvinick, M. M., & Gershman, S. J. (2017). The hippocampus as a predictive map. Nature Neuroscience, 20(11), 16431653.CrossRefGoogle ScholarPubMed
Sternberg, R. J., & Lubart, T. I. (1996). Investing in creativity. American Psychologist, 51(7), 677.CrossRefGoogle Scholar
Sutton, R. S., Precup, D., & Singh, S. (1999). Between mdps and semi-mdps: A framework for temporal abstraction in reinforcement learning. Artificial Intelligence, 112(1–2), 181211.CrossRefGoogle Scholar
Tang, H., Houthooft, R., Foote, D., Stooke, A., Xi Chen, O., Duan, Y., … Abbeel, P. (2017). # exploration: A study of count-based exploration for deep reinforcement learning. Advances in Neural Information Processing Systems, 30, 27532762.Google Scholar
Wade, S., & Kidd, C. (2019). The role of prior knowledge and curiosity in learning. Psychonomic Bulletin & Review, 26(4), 13771387. http://dx.doi.org/10.3758/s13423-019-01598-6.CrossRefGoogle ScholarPubMed
Wilson, R., Bonawitz, E., Costa, V. D., & Ebitz, R. B. (2021). Balancing exploration and exploitation with information and randomization. Current Opinion in Behavioral Sciences, 38, 4956.CrossRefGoogle ScholarPubMed
Wilson, R., Wang, S., Sadeghiyeh, H., & Cohen, J. D. (2020). Deep exploration as a unifying account of explore-exploit behavior.CrossRefGoogle Scholar
Wittmann, B. C., Bunzeck, N., Dolan, R. J., & Düzel, E. (2007). Anticipation of novelty recruits reward system and hippocampus while promoting recollection. Neuroimage, 38(1), 194202.CrossRefGoogle ScholarPubMed
Wittmann, B. C., Daw, N. D., Seymour, B., & Dolan, R. J. (2008). Striatal activity underlies novelty-based choice in humans. Neuron, 58(6), 967973.CrossRefGoogle ScholarPubMed
Yoo, J., Bornstein, A., & Chrastil, E. R. (2023). Cognitive graphs: Representational substrates for planning.CrossRefGoogle Scholar
Zedelius, C. M., Gross, M. E., & Schooler, J. W. (2022). Inquisitive but not discerning: Deprivation curiosity is associated with excessive openness to inaccurate information. Journal of Research in Personality, 98, 104227.CrossRefGoogle Scholar
Zhou, D., Kim, J. Z., Pines, A. R., Sydnor, V. J., Roalf, D. R., Detre, J. A., … Bassett, D. S. (2022). Compression supports low-dimensional representations of behavior across neural circuits. bioRxiv, 2022–11.CrossRefGoogle Scholar
Zhou, D., Lydon-Staley, D. M., Zurn, P., & Bassett, D. S. (2020). The growth and form of knowledge networks by kinesthetic curiosity. Current Opinion in Behavioral Sciences, 35, 125134.CrossRefGoogle ScholarPubMed
Zhou, D., Patankar, S., Lydon-Staley, D. M., Zurn, P., Gerlach, M., & Bassett, D. S. (2023). Architectural styles of curiosity in global Wikipedia mobile app readership. PsyArXiv.Google Scholar
Zurn, P. (2021). Curiosity: An affect of resistance. Theory & Event, 24(2), 611617.CrossRefGoogle Scholar