Reinforcement learning with modified exploration strategy for mobile robot path planning

Nesrine Khlif; Khraief Nahla; Belghith Safya

doi:10.1017/S0263574723000607

Reinforcement learning with modified exploration strategy for mobile robot path planning

Published online by Cambridge University Press: 11 May 2023

Nesrine Khlif

Khraief Nahla and

Belghith Safya

Show author details

Nesrine Khlif*: Affiliation:
Laboratory of Robotics, Informatics and Complex Systems (RISC lab - LR16ES07), ENIT, University of Tunis EL Manar, Le BELVEDERE, Tunis, Tunisia
Khraief Nahla: Affiliation:
Laboratory of Robotics, Informatics and Complex Systems (RISC lab - LR16ES07), ENIT, University of Tunis EL Manar, Le BELVEDERE, Tunis, Tunisia
Belghith Safya: Affiliation:
Laboratory of Robotics, Informatics and Complex Systems (RISC lab - LR16ES07), ENIT, University of Tunis EL Manar, Le BELVEDERE, Tunis, Tunisia
*: Corresponding author: Nesrine Khlif; Email: nesrine.khlif@etudiant-enit.utm.tn

Article contents

Abstract
References

Get access

Rights & Permissions

Abstract

Driven by the remarkable developments we have observed in recent years, path planning for mobile robots is a difficult part of robot navigation. Artificial intelligence applied to mobile robotics is also a distinct challenge; reinforcement learning (RL) is one of the most used algorithms in robotics. The exploration-exploitation dilemma is a motivating challenge for the performance of RL algorithms. The problem is balancing exploitation and exploration, as too much exploration leads to a decrease in cumulative reward, while too much exploitation locks the agent in a local optimum. This paper proposes a new path planning method for mobile robot based on Q-learning with an improved exploration strategy. In addition, a comparative study of Boltzmann distribution and $\epsilon$-greedy politics is presented. Through simulations, the better performance of the proposed method in terms of execution time, path length, and cost function is confirmed.

Keywords

mobile robotic path planning Q-learning exploration-exploitation strategies Boltzmann distribution ϵ-greedy

Type: Research Article
Information: Robotica , Volume 41 , Issue 9 , September 2023 , pp. 2688 - 2702

DOI: https://doi.org/10.1017/S0263574723000607 [Opens in a new window]
Copyright: © The Author(s), 2023. Published by Cambridge University Press

Access options

Get access to the full version of this content by using one of the access options below. (Log in options will check for institutional or personal access. Content may require purchase if you do not have access.)

References

Pei, M., An, H, B. Liu and C. Wang, “An Improved Dyna-Q Algorithm for Mobile Robot Path Planning in Unknown Dynamic Environment,” In: IEEE Transactions on Systems, Man, and Cybernetics: Systems, 52(7) 4415–4425 (2022). doi: 10.1109/TSMC.2021.3096935.Google Scholar

Fruit, R., Exploration-Exploitation Dilemma in Reinforcement Learning Under Various Form of Prior Knowledge, (2019). Artificial Intelligence [cs.AI]. Université de Lille 1, Sciences et Technologies; CRIStAL UMR 9189, 2019. English. fftel-02388395v2f.Google Scholar

Tijsma, A. D., Drugan, M. M. and Wiering, M. A., “Comparing Exploration Strategies for Q-Learning in Random Stochastic Mazes,” In: 2016 IEEE Symposium Series on Computational Intelligence (SSCI), Athens, Greece (2016) pp. 1–8, doi: 10.1109/SSCI.2016.7849366.CrossRef Google Scholar

McFarlane, R. A Survey of Exploration Strategies in Reinforcement Learning (McGill University, Montreal, QC, Canada, 2018).Google Scholar

Thrun, B.,“Efficient Exploration in Reinforcement Learning”, (1992), Technical report CMU-CS-92-102, School of Computer Science Carnegie-Mellon University.Google Scholar

Thrun, S. B., Efficient Exploration in Reinforcement Learning, (1992). Technical Report.Google Scholar

Wiering, M. and Schmidhuber, J., “Ecient Model-Based Exploration,” In: Proceedings of the Fifth International Conference on Simulation of Adaptive Behavior (SAB98), FromAnimals to Animats 5, Switzerland (1998) pp. 223–228.Google Scholar

Koroveshi, J. and Ktona, A., “A Comparison of Exploration Strategies Used in Reinforcement Learning for Building an Intelligent Tutoring System,” In: Proceedings of the 4th International Conference on Recent Trends and Applications in Computer Science and Information Technology (RTA-CSIT, Tirana, Albania 2021).Google Scholar

Li, S., Xu, X. and Zuo, L., “Dynamic Path Planning of a Mobile Robot with Improved Q-Learning Algorithm,” In: 2015 IEEE International Conference on Information and Automation, Lijiang, China (2015) pp. 409–414. doi: 10.1109/ICInfA.2015.7279322.CrossRef Google Scholar

Liu, X., Zhou, Q., Ren, H. and Sun, C., “Reinforcement Learning for Robot Navigation in Nondeterministic Environments,” In: 2018 5th IEEE International Conference on Cloud Computing and Intelligence Systems (CCIS), Nanjing, China (2018) pp. 615–619. doi: 10.1109/CCIS.2018.8691217.CrossRef Google Scholar

Kim, H. and Lee, W., “Real-Time Path Planning Through Q-learning’s Exploration Strategy Adjustment,” In: 2021 International Conference on Electronics, Information, and Communication (ICEIC), Jeju, Korea (2021) pp. 1–3. doi: 10.1109/ICEIC51217.2021.9369749.CrossRef Google Scholar

Hester, T., Lopes, M. and Stone, P., “Learning Exploration Strategies in Model-Based Reinforcement Learning,” In: 12th International Conference on Autonomous Agents and Multiagent Systems 2013, AAMAS (2013).Google Scholar

Tokic, M., “Adaptive ϵ-Greedy Exploration in Reinforcement Learning Based on Value Differences,” In: KI 2010: Advances in Artificial Intelligence, KI 2010. Lecture Notes in Computer Science, Dillmann, R., Beyerer, J., Hanebeck, U. D. and Schultz, T., 6359 (Springer, Berlin, Heidelberg, 2010). doi: 10.1007/978-3-642-16111-7.Google Scholar

Susan, A., Maziar, G., Harsh, S., Hoof, H. and Doina, P., A Survey of Exploration Methods in Reinforcement Learning, (2021). arXiv.org perpetual doi: 10.48550/ARXIV.2109.00157.CrossRef Google Scholar

Mehta, D., “State-of-the-art reinforcement learning algorithms,” Int. J. Eng. Res. Technol. 8(12), (2019), 717–722.Google Scholar

Sutton, R. S. and Barto, A. G., Reinforcement learning: An introduction, 352 (2015), 113–138.Google Scholar

Miljković, Z., Mitić, M., Lazarević, M. and Babić, B., “Neural network reinforcement learning for visual control of robot manipulators,” J. Expert Syst. Appl. 40(5), 1721–1736 (2013). doi: 10.1016/j.eswa.2012.09.010.CrossRef Google Scholar

Anis, K., Hachemi, B., Imen, Ch., Sahar, T., Adel, A., Mohamed-Foued, S., Maram, A., Omar, Ch. and Yasir, J., “Introduction to Mobile Robot Path Planning,” In: Robot Path Planning and Cooperation: Foundations, Algorithms and Experimentation, (Springer International Publishing, Saudi Arabia 2018). doi: 10.1007/978-3-319-77042-0.Google Scholar

Masehian, E. and Amin-Naseri, M., “A voronoi diagram-visibility graph potential field compound algorithm for robot path planning,” J. Robot. Syst. 21(6), 275–300 (2004). doi: 10.1002/rob.20014.CrossRef Google Scholar

Zhang, L. and Li, Y., “Mobile Robot Path Planning Algorithm Based on Improved A Star,” In: Journal of Physics: Conference Series, Volume 1848, 2021 4th International Conference on Advanced Algorithms and Control Engineering (ICAACE 2021), Sanya, China (January 29-31 2021).CrossRef Google Scholar

Fusic, S. J., Ramkumar, P. and Hariharan, K., “Path Planning of Robot Using Modified Dijkstra Algorithm,” In: 2018 National Power Engineering Conference (NPEC), Madurai, India (2018) pp. 1–5. doi: 10.1109/NPEC.2018.8476787.CrossRef Google Scholar

Dönmez, E., Kocamaz, A. F. and Dirik, M., “Bi-RRT Path Extraction and Curve Fitting Smooth with Visual Based Configuration Space Mapping,” In: International Artificial Intelligence and Data Processing Symposium (IDAP), Malatya, Turkey (2017) pp. 1–5, doi: 10.1109/IDAP.2017.8090214.CrossRef Google Scholar

Choueiry, S., Owayjan, M., Diab, H. and Achkar, R., “Mobile Robot Path Planning Using Genetic Algorithm in a Static Environment,” In: 2019 Fourth International Conference on Advances in Computational Tools for Engineering Applications (ACTEA), Beirut, Lebanon (2019) pp. 1–6. doi: 10.1109/ACTEA.2019.8851100.CrossRef Google Scholar

Hosseininejad, S. and Dadkhah, C., “Mobile robot path planning in dynamic environment based on cuckoo optimization algorithm,” Int. J. Adv. Robot. Syst. 16(2), 172988141983957 (2019). doi: 10.1177/1729881419839575.CrossRef Google Scholar

Bae, H., Kim, G., Kim, J., Qian, D. and Lee, S., “Multi-robot path planning method using reinforcement learning,” Appl. Sci. 9(15), 3057 (2019). doi: 10.3390/app9153057.CrossRef Google Scholar

Pang, B., Song, Y., Zhang, C. and Yang, R., “Effect of random walk methods on searching efficiency in swarm robots for area exploration,” Appl. Intell. 51(7), 5189–5199 (2021). doi: 10.1007/s10489-020-02060-0.CrossRef Google Scholar

Pan, L., Cai, Q., Meng, Q., Chen, W. and L, “Reinforcement Learning with Dynamic Boltzmann Softmax Updates,” In: IJCAI’20: Proceedings of theTwenty-Ninth International Joint Conference on Artificial Intelligence, Yokohama, Japan (2021) pp. 1992–1998. doi: 10.24963/ijcai.2020/272.CrossRef Google Scholar

Mahajan, A. and Teneketzis, D., “Multi-Armed Bandit Problems,” In: Foundations and Applications of Sensor Management, Hero, A. O., Castañón, D. A., Cochran, D. and Kastella, K., (Springer, Boston, MA, 2008). doi: 10.1007/978-0-387-49819-5.Google Scholar

Asadi, K. and Littman, M. L.. “An Alternative Softmax Operator for Reinforcement Learning,” In: Proceedings of the 34th International Conference on Machine Learning, PMLR, Sydney, Australia(2017) pp. 243–252.Google Scholar

François-Lavet, V., Fonteneau, R. and Ernst, D., “How to Discount Deep Reinforcement Learning: Towards New Dynamic Strategies,” In: Cornell University, NIPS 2015 Deep Reinforcement Learning Workshop (2016). doi: 10.48550/arXiv.1512.02011.CrossRef Google Scholar

Brownlee, J., How to Configure the Learning Rate When Training Deep Learning Neural Networks, In: Deep Learning Performance, Machine Learning Mastery, (2019).Google Scholar

Kim, C. H., Watanabe, K., Nishide, S. and Gouko, M., “Epsilon-Greedy Babbling,” In: 2017 Joint IEEE International Conference on Development and Learning and Epigenetic Robotics (ICDL-EpiRob), Lisbon, Portugal (2017) pp. 227–232. doi: 10.1109/DEVLRN.2017.8329812.CrossRef Google Scholar

Sichkar, V. N., “Reinforcement Learning Algorithms in Global Path Planning for Mobile Robot,” In: 2019 International Conference on Industrial Engineering, Applications and Manufacturing (ICIEAM), Sochi, Russia (2019) pp. 1–5. doi: 10.1109/ICIEAM.2019.8742915.CrossRef Google Scholar

Dönmez, E., Kocamaz, A. F. and Dirik, M., “A vision-based real-time mobile robot controller design based on Gaussian function for indoor environment,” Arab. J. Sci. Eng. 43(12), 7127–7142 (2018). doi: 10.1007/s13369-017-2917-0.CrossRef Google Scholar

Mitić, M. and Miljković, Z., “Neural network learning from demonstration and epipolar geometry for visual control of a nonholonomic mobile robot,” Soft Comput. 18(5), 1011–1025 (2014). doi: 10.1007/s00500-013-1121-8.CrossRef Google Scholar

Article contents

Reinforcement learning with modified exploration strategy for mobile robot path planning

Abstract

Keywords

Access options

References

Save article to Kindle

Save article to Dropbox

Save article to Google Drive

Reply to: Submit a response

Your details

You have entered the maximum number of contributors

Conflicting interests