Experimental Study of Reinforcement Learning in Mobile Robots Through Spiking Architecture of Thalamo-Cortico-Thalamic Circuitry of Mammalian Brain

Vahid Azimirad; Mohammad Fattahi Sani

doi:10.1017/S0263574719001632

Experimental Study of Reinforcement Learning in Mobile Robots Through Spiking Architecture of Thalamo-Cortico-Thalamic Circuitry of Mammalian Brain

Published online by Cambridge University Press: 18 November 2019

Vahid Azimirad

and

Mohammad Fattahi Sani

Show author details

Vahid Azimirad*: Affiliation:
Faculty of Electrical and Computer Engineering, University of Tabriz, Tabriz, Iran E-mail: m.fattahi93@ms.tabrizu.ac.ir
Mohammad Fattahi Sani: Affiliation:
Faculty of Electrical and Computer Engineering, University of Tabriz, Tabriz, Iran E-mail: m.fattahi93@ms.tabrizu.ac.ir
*: *Corresponding author. E-mail: Azimirad@tabrizu.ac.ir

Article contents

Summary
References

Get access

Rights & Permissions

Summary

In this paper, the behavioral learning of robots through spiking neural networks is studied in which the architecture of the network is based on the thalamo-cortico-thalamic circuitry of the mammalian brain. According to a variety of neurons, the Izhikevich model of single neuron is used for the representation of neuronal behaviors. One thousand and ninety spiking neurons are considered in the network. The spiking model of the proposed architecture is derived and prepared for the learning problem of robots. The reinforcement learning algorithm is based on spike-timing-dependent plasticity and dopamine release as a reward. It results in strengthening the synaptic weights of the neurons that are involved in the robot’s proper performance. Sensory and motor neurons are placed in the thalamus and cortical module, respectively. The inputs of thalamo-cortico-thalamic circuitry are the signals related to distance of the target from robot, and the outputs are the velocities of actuators. The target attraction task is used as an example to validate the proposed method in which dopamine is released when the robot catches the target. Some simulation studies, as well as experimental implementation, are done on a mobile robot named Tabrizbot. Experimental studies illustrate that after successful learning, the meantime of catching target is decreased by about 36%. These prove that through the proposed method, thalamo-cortical structure could be trained successfully to learn to perform various robotic tasks.

Keywords

Reinforcement learning Spiking neural networks Mobile robot Thalamo-cortico-thalamic circuitry Dopamine modulator

Type: Articles
Information: Robotica , Volume 38 , Issue 9 , September 2020 , pp. 1558 - 1575

DOI: https://doi.org/10.1017/S0263574719001632 [Opens in a new window]
Copyright: © Cambridge University Press 2019

Access options

Get access to the full version of this content by using one of the access options below. (Log in options will check for institutional or personal access. Content may require purchase if you do not have access.)

References

Murray Sherman, S., “Thalamus,” Scholarpedia 1(9), 1583 (2006).CrossRef Google Scholar

Grossberg, S. and Versace, M., “Spikes, synchrony, and attentive learning by laminar thalamocortical circuits,” Brain Res. 1218, 278–312 (2008).CrossRef Google Scholar PubMed

Chersi, F., Mirolli, M., Pezzulo, G. and Baldassarre, G., “A spiking neuron model of the cortico-basal ganglia circuits for goal-directed and habitual action learning,” Neural Networks 41, 212–224 (2013).CrossRef Google Scholar PubMed

Andrés Chalita, M., Lis, D. and Caverzasi, A., “Reinforcement learning in a bio-connectionist model based in the thalamo-cortical neural circuit,” Biolog. Ins. Cogn. Arch. 16, 45–63 (2016).Google Scholar

Stewart, T. C., Bekolay, T. and Eliasmith, C., “Learning to select actions with spiking neurons in the basal ganglia,” Front. Neurosci. 6, 2 (2012).CrossRef Google Scholar PubMed

Shteingart, H. and Loewenstein, Y., “Reinforcement learning and human behavior,” Curr. Opinion Neurobiol. 25, 93–98 (2014).CrossRef Google Scholar PubMed

Maia, T. V., “Reinforcement learning, conditioning, and the brain: Successes and challenges,” Cogn. Affect. Behav. Neurosci. 9(4), 343–364 (2009).CrossRef Google Scholar PubMed

Balleine, B. W., Morris, R. W. and Leung, B. K., “Thalamocortical integration of instrumental learning and performance and their disintegration in addiction,” Brain Res. 1628(A), 104–116 (2015).CrossRef Google Scholar PubMed

Tanaka, Y. H., Tanaka, Y. R., Kondo, M., Terada, S.-I., Kawaguchi, Y. and Matsuzaki, M., “Thalamocortical axonal activity in motor cortex exhibits layer-specific dynamics during motor learning,” Neuron 100(1), 244–258 (2018).CrossRef Google Scholar PubMed

Izhikevich, E. M., “Which model to use for cortical spiking neurons?,” IEEE Trans. Neural Networks 15(5), 1063–1070 (2004).CrossRef Google Scholar PubMed

Breakspear, M., “Dynamic models of large-scale brain activity,” Nature Neurosci. 20(3), 340 (2017).CrossRef Google Scholar PubMed

Sarim, M., Schultz, T., Kumar, M. and Jha, R., “An Artificial Brain Mechanism to Develop a Learning Paradigm for Robot Navigation,” ASME 2016 Dynamic Systems and Control Conference (American Society of Mechanical Engineers, 2016) pp. V001T03A004–V001T03A004.Google Scholar

Izhikevich, E. M. and Edelman, G. M., “Large-scale model of mammalian thalamocortical systems,” Proc. Nat. Acad. Sci. 105(9), 3593–3598 (2008).CrossRef Google Scholar PubMed

Izhikevich, E. M., “Solving the distal reward problem through linkage of STDP and dopamine signaling,” Cerebral Cortex 17(10), 2443–2452 (2007).CrossRef Google Scholar PubMed

Elibol, R. and Şengör, N. S., “Building neurocomputational models at different levels for basal ganglia circuit,” Istanbul Univ. J. Elect. Electron. Eng. 17(1), 3137–3146 (2017).Google Scholar

Erçelik, E. and Şengör, N. S., “A Neurocomputational Model Implemented on Humanoid Robot for Learning Action Selection,” 2015 International Joint Conference on Neural Networks (IJCNN) (IEEE, 2015) pp. 1–6.CrossRef Google Scholar

Kober, J., Andrew Bagnell, J. and Peters, J., “Reinforcement learning in robotics: A survey,” Int. J. Robot. Res. 32(11), 1238–1274 (2013).CrossRef Google Scholar

Miljković, Z., Mitić, M., Lazarević, M. and Babić, B., “Neural network reinforcement learning for visual control of robot manipulators,” Expert Syst. Appl. 40(5), 1721–1736 (2013).CrossRef Google Scholar

Bing, Z., Meschede, C., Röhrbein, F., Huang, K. and Knoll, A. C., “A survey of robotics control based on learning-inspired spiking neural networks,” Front. Neurorobot. 12, 35 (2018).CrossRef Google Scholar PubMed

Khamassi, M., Lallée, S., Enel, P., Procyk, E. and Dominey, P. F. “Robot cognitive control with a neurophysiologically inspired reinforcement learning model,” Front. Neurorobot. 5(1), 1–3 (2011).CrossRef Google Scholar PubMed

Prescott, T. J., Montes González, F. M., Gurney, K., Humphries, M. D. and Redgrave, P., “A robot model of the basal ganglia: Behavior and intrinsic processing,” Neural Networks 19(1), 31–61 (2006).CrossRef Google Scholar PubMed

Long, L. and Fang, G., “A Review of Biologically Plausible Neuron Models for Spiking Neural Networks,” In: AIAA Infotech@ Aerospace 2010 (2010) p. 3540.Google Scholar

Burrafato, M. and Florio, L., “A cognitive architecture based on an amygdala thalamo cortical model for developing new goals and behaviors: application in humanoid robotics,” Master’s thesis (Politecnico di Milano, 2012).Google Scholar

Bhattacharya, B. S., Patterson, C., Galluppi, F., Durrant, S. J. and Furber, S., “Engineering a thalamo-cortico-thalamic circuit on spinnaker: A preliminary study toward modeling sleep and wakefulness,” Front. Neural Circ. 8, 46 (2014).Google Scholar PubMed

Bhattacharya, B. S., Coyle, D. and Maguire, L. P., “A thalamo–cortico–thalamic neural mass model to study alpha rhythms in alzheimers disease,” Neural Networks 24(6), 631–645 (2011).CrossRef Google Scholar

Igarashi, J., Shouno, O., Fukai, T. and Tsujino, H., “Real-time simulation of a spiking neural network model of the basal ganglia circuitry using general purpose computing on graphics processing units,” Neural Networks 24(9), 950–960 (2011).CrossRef Google Scholar PubMed

Humphries, M. D., Stewart, R. D. and Gurney, K. N., “A physiologically plausible model of action selection and oscillatory activity in the basal ganglia,” J. Neurosci. 26(50), 12921–12942 (2006).CrossRef Google Scholar PubMed

Gurney, K., Prescott, T. J. and Redgrave, P., “A computational model of action selection in the basal ganglia. i. a new functional anatomy,” Biolog. Cybern. 84(6), 401–410 (2001).CrossRef Google Scholar PubMed

Shouno, O., Takeuchi, J. and Tsujino, H., “A Spiking Neuron Model of the Basal Ganglia Circuitry that can Generate Behavioral Variability,” In: The Basal Ganglia IX (Groenewegen, H. J., Voorn, P., Berendse, H. W., Mulder, A. B. and Cools, A. R., eds.) (Springer, New York, 2009) pp. 191–200.CrossRef Google Scholar

Cao, Z., Cheng, L., Zhou, C., Gu, N., Wang, X. and Tan, M., “Spiking neural network-based target tracking control for autonomous mobile robots,” Neural Comput. Appl. 26(8), 1839–1847 (2015).CrossRef Google Scholar

Arena, P., De Fiore, S., Patané, L., Pollino, M. and Ventura, C., “Insect Inspired Unsupervised Learning for Tactic and Phobic Behavior Enhancement in a Hybrid Robot,” The 2010 International Joint Conference on Neural Networks (IJCNN) (IEEE, 2010) pp. 1–8.CrossRef Google Scholar

Bouganis, A. and Shanahan, M., “Training a Spiking Neural Network to Control a 4-DOF Robotic Arm based on Spike Timing-Dependent Plasticity,” The 2010 International Joint Conference on Neural Networks (IJCNN) (IEEE, 2010) pp. 1–8.CrossRef Google Scholar

Nadjib Zennir, M., Benmohammed, M. and Boudjadja, R., “Spike-Time Dependant Plasticity in a Spiking Neural Network for Robot Path Planning,” AIAI Workshops (2015) pp. 2–13.Google Scholar

Azimirad, V., Sani, M. F. and Ramezanlou, M. T., “Unsupervised Learning of Target Attraction for Robots Through Spike Timing Dependent Plasticity,” 2017 IEEE 4th International Conference on Knowledge-Based Engineering and Innovation (KBEI) (IEEE, 2017) pp. 0428–0433.CrossRef Google Scholar

Nichols, E., McDaid, L. J. and Siddique, N. H., “Case study on a self-organizing spiking neural network for robot navigation,” Int. J. Neural Syst. 20(06), 501–508 (2010).CrossRef Google Scholar PubMed

Batllori, R., Laramee, C. B., Land, W. and David Schaffer, J., “Evolving spiking neural networks for robot control,” Procedia Comput. Sci. 6, 329–334 (2011).CrossRef Google Scholar

Cyr, A. and Boukadoum, M., “Classical conditioning in different temporal constraints: An STDP learning rule for robots controlled by spiking neural networks,” Adapt. Behav. 20(4), 257–272 (2012).CrossRef Google Scholar

Zhang, X., Xu, Z., Henriquez, C. and Ferrari, S., “Spike-Based Indirect Training of a Spiking Neural Network-Controlled Virtual Insect,” 52nd IEEE Conference on Decision and Control (IEEE, 2013) pp. 6798–6805.CrossRef Google Scholar

Nichols, E., McDaid, L. J. and Siddique, N., “Biologically inspired SNN for robot control,” IEEE Trans. Cybern. 43(1), 115–128 (2012).CrossRef Google Scholar PubMed

Mazumder, P., Hu, D., Ebong, I., Zhang, X., Xu, Z. and Ferrari, S., “Digital implementation of a virtual insect trained by spike-timing dependent plasticity,” Integration 100(54), 109–117 (2016).CrossRef Google Scholar

Masuta, H. and Kubota, N., “Learnablity of a spiking neural network for perception of a partner robot,” 2008 IEEE International Conference on Systems, Man and Cybernetics (IEEE, 2008) pp. 1413–1418.CrossRef Google Scholar

Hagras, H., Pounds-Cornish, A., Colley, M., Callaghan, V. and Clarke, G., “Evolving Spiking Neural Network Controllers for Autonomous Robots,” IEEE International Conference on Robotics and Automation. ICRA’04., vol, 5 (IEEE, 2004) pp. 4620–4626.CrossRef Google Scholar

Alnajjar, F. and Murase, K., “A simple aplysia-like spiking neural network to generate adaptive behavior in autonomous robots,” Adaptive Behavior 16(5), 306–324 (2008).CrossRef Google Scholar

Takase, N., Botzheim, J. and Kubota, N., “Evolving Spiking Neural Network for Robot Locomotion Generation,” 2015 IEEE Congress on Evolutionary Computation (CEC) (IEEE, 2015) pp. 558–565.CrossRef Google Scholar

Oniz, Y. and Kaynak, O., “Control of a direct drive robot using fuzzy spiking neural networks with variable structure systems-based learning algorithm,” Neurocomputing 149(PB), 690–699 (2015).CrossRef Google Scholar

Wang, X., Hou, Z.-G., Zou, A., Tan, M. and Cheng, L., “A behavior controller based on spiking neural networks for mobile robots,” Neurocomputing 71(4–6), 655–666 (2008).CrossRef Google Scholar

Singh, N., Huyck, C. R., Gandhi, V. and Jones, A., “Neuron-based control mechanisms for a robotic arm and hand,” Int. J. Comput. Elect. Auto. Control Inf. Eng. 11(2), 221–229 (2017).Google Scholar

Wang, X., Hou, Z.-G., Tan, M., Wang, Y. and Wang, X., “Corridor-Scene Classification for Mobile Robot Using Spiking Neurons,” 2008 Fourth International Conference on Natural Computation, vol. 4 (IEEE, 2008) pp. 125–129.CrossRef Google Scholar

Wang, X., Hou, Z.-G., Tan, M., Wang, Y. and Hu, L., “The Wall-Following Controller for the Mobile Robot Using Spiking Neurons,” 2009 International Conference on Artificial Intelligence and Computational Intelligence, vol. 1 (IEEE, 2009) pp. 194–199.CrossRef Google Scholar

Wang, X., Hou, Z.-G., Lv, F., Tan, M. and Wang, Y., “Mobile robots modular navigation controller using spiking neural networks,” Neurocomputing 134, 230–238 (2014).CrossRef Google Scholar

Helgadottir, L. I., Haenicke, J., Landgraf, T., Rojas, R. and Nawrot, M. P., “Conditioned Behavior in a Robot Controlled by a Spiking Neural Network,” 2013 6th International IEEE/EMBS Conference on Neural Engineering (NER) (IEEE, 2013) pp. 891–894.CrossRef Google Scholar

Dumesnil, E., Beaulieu, P.-O. and Boukadoum, M., “Robotic Implementation of Classical and Operant Conditioning as a Single STDP Learning Process,” 2016 International Joint Conference on Neural Networks (IJCNN) (IEEE, 2016) pp. 5241–5247.CrossRef Google Scholar

Dura-Bernal, S., Chadderdon, G. L., Neymotin, S. A., Francis, J. T. and Lytton, W. W., “Towards a real-time interface between a biomimetic model of sensorimotor cortex and a robotic arm,” Pattern Recogn. Lett. 36, 204–212 (2014).CrossRef Google Scholar

Izhikevich, E. M., “Simple model of spiking neurons,” IEEE Trans. Neural Networks 14(6), 1569–1572 (2003).CrossRef Google Scholar PubMed

Schultz, W., “Predictive reward signal of dopamine neurons,” J. Neurophysiology 80(1), 1–27 (1998).CrossRef Google Scholar PubMed

Chorley, P. and Seth, A. K., “Closing the Sensory-Motor Loop on Dopamine Signalled Reinforcement Learning,” International Conference on Simulation of Adaptive Behavior (Springer, 2008) pp. 280–290.CrossRef Google Scholar

Neymotin, S. A., Chadderdon, G. L., Kerr, C. C., Francis, J. T. and Lytton, W. W., “Reinforcement learning of two-joint virtual arm reaching in a computer model of sensorimotor cortex,” Neural Comput. 25(12), 3263–3293 (2013).CrossRef Google Scholar

Ursino, M., Cona, F. and Zavaglia, M., “The generation of rhythms within a cortical region: Analysis of a neural mass model,” NeuroImage 52(3), 1080–1094 (2010).CrossRef Google Scholar PubMed

Yonk, A. J. and Margolis, D. J., “Traces of learning in thalamocortical circuits,” Neuron 103(2), 175–176 (2019).CrossRef Google Scholar PubMed

Takashima, Y., Scanziani, M., Conner, J. M., Biane, J. S. and Tuszynski, M. H., “Thalamocortical projections onto behaviorally relevant neurons exhibit plasticity during adult motor learning,” Neuron 89(6), 1173–1179 (2016).Google Scholar

Article contents

Experimental Study of Reinforcement Learning in Mobile Robots Through Spiking Architecture of Thalamo-Cortico-Thalamic Circuitry of Mammalian Brain

Summary

Keywords

Access options

References

Save article to Kindle

Save article to Dropbox

Save article to Google Drive

Reply to: Submit a response

Your details

You have entered the maximum number of contributors

Conflicting interests