Hostname: page-component-7c8c6479df-hgkh8 Total loading time: 0 Render date: 2024-03-29T10:12:43.027Z Has data issue: false hasContentIssue false

Reinforcement learning with modified exploration strategy for mobile robot path planning

Published online by Cambridge University Press:  11 May 2023

Nesrine Khlif*
Affiliation:
Laboratory of Robotics, Informatics and Complex Systems (RISC lab - LR16ES07), ENIT, University of Tunis EL Manar, Le BELVEDERE, Tunis, Tunisia
Khraief Nahla
Affiliation:
Laboratory of Robotics, Informatics and Complex Systems (RISC lab - LR16ES07), ENIT, University of Tunis EL Manar, Le BELVEDERE, Tunis, Tunisia
Belghith Safya
Affiliation:
Laboratory of Robotics, Informatics and Complex Systems (RISC lab - LR16ES07), ENIT, University of Tunis EL Manar, Le BELVEDERE, Tunis, Tunisia
*
Corresponding author: Nesrine Khlif; Email: nesrine.khlif@etudiant-enit.utm.tn

Abstract

Driven by the remarkable developments we have observed in recent years, path planning for mobile robots is a difficult part of robot navigation. Artificial intelligence applied to mobile robotics is also a distinct challenge; reinforcement learning (RL) is one of the most used algorithms in robotics. The exploration-exploitation dilemma is a motivating challenge for the performance of RL algorithms. The problem is balancing exploitation and exploration, as too much exploration leads to a decrease in cumulative reward, while too much exploitation locks the agent in a local optimum. This paper proposes a new path planning method for mobile robot based on Q-learning with an improved exploration strategy. In addition, a comparative study of Boltzmann distribution and $\epsilon$-greedy politics is presented. Through simulations, the better performance of the proposed method in terms of execution time, path length, and cost function is confirmed.

Type
Research Article
Copyright
© The Author(s), 2023. Published by Cambridge University Press

Access options

Get access to the full version of this content by using one of the access options below. (Log in options will check for institutional or personal access. Content may require purchase if you do not have access.)

References

Pei, M., An, H, B. Liu and C. Wang, “An Improved Dyna-Q Algorithm for Mobile Robot Path Planning in Unknown Dynamic Environment,” In: IEEE Transactions on Systems, Man, and Cybernetics: Systems, 52(7) 4415–4425 (2022). doi: 10.1109/TSMC.2021.3096935.Google Scholar
Fruit, R., Exploration-Exploitation Dilemma in Reinforcement Learning Under Various Form of Prior Knowledge, (2019). Artificial Intelligence [cs.AI]. Université de Lille 1, Sciences et Technologies; CRIStAL UMR 9189, 2019. English. fftel-02388395v2f.Google Scholar
Tijsma, A. D., Drugan, M. M. and Wiering, M. A., “Comparing Exploration Strategies for Q-Learning in Random Stochastic Mazes,” In: 2016 IEEE Symposium Series on Computational Intelligence (SSCI), Athens, Greece (2016) pp. 18, doi: 10.1109/SSCI.2016.7849366.CrossRefGoogle Scholar
McFarlane, R. A Survey of Exploration Strategies in Reinforcement Learning (McGill University, Montreal, QC, Canada, 2018).Google Scholar
Thrun, B.,“Efficient Exploration in Reinforcement Learning”, (1992), Technical report CMU-CS-92-102, School of Computer Science Carnegie-Mellon University.Google Scholar
Thrun, S. B., Efficient Exploration in Reinforcement Learning, (1992). Technical Report.Google Scholar
Wiering, M. and Schmidhuber, J., “Ecient Model-Based Exploration,” In: Proceedings of the Fifth International Conference on Simulation of Adaptive Behavior (SAB98), FromAnimals to Animats 5, Switzerland (1998) pp. 223228.Google Scholar
Koroveshi, J. and Ktona, A., “A Comparison of Exploration Strategies Used in Reinforcement Learning for Building an Intelligent Tutoring System,” In: Proceedings of the 4th International Conference on Recent Trends and Applications in Computer Science and Information Technology (RTA-CSIT, Tirana, Albania 2021).Google Scholar
Li, S., Xu, X. and Zuo, L., “Dynamic Path Planning of a Mobile Robot with Improved Q-Learning Algorithm,” In: 2015 IEEE International Conference on Information and Automation, Lijiang, China (2015) pp. 409414. doi: 10.1109/ICInfA.2015.7279322.CrossRefGoogle Scholar
Liu, X., Zhou, Q., Ren, H. and Sun, C., “Reinforcement Learning for Robot Navigation in Nondeterministic Environments,” In: 2018 5th IEEE International Conference on Cloud Computing and Intelligence Systems (CCIS), Nanjing, China (2018) pp. 615619. doi: 10.1109/CCIS.2018.8691217.CrossRefGoogle Scholar
Kim, H. and Lee, W., “Real-Time Path Planning Through Q-learning’s Exploration Strategy Adjustment,” In: 2021 International Conference on Electronics, Information, and Communication (ICEIC), Jeju, Korea (2021) pp. 13. doi: 10.1109/ICEIC51217.2021.9369749.CrossRefGoogle Scholar
Hester, T., Lopes, M. and Stone, P., “Learning Exploration Strategies in Model-Based Reinforcement Learning,” In: 12th International Conference on Autonomous Agents and Multiagent Systems 2013, AAMAS (2013).Google Scholar
Tokic, M., “Adaptive ϵ-Greedy Exploration in Reinforcement Learning Based on Value Differences,” In: KI 2010: Advances in Artificial Intelligence, KI 2010. Lecture Notes in Computer Science, Dillmann, R., Beyerer, J., Hanebeck, U. D. and Schultz, T., 6359 (Springer, Berlin, Heidelberg, 2010). doi: 10.1007/978-3-642-16111-7.Google Scholar
Susan, A., Maziar, G., Harsh, S., Hoof, H. and Doina, P., A Survey of Exploration Methods in Reinforcement Learning, (2021). arXiv.org perpetual doi: 10.48550/ARXIV.2109.00157.CrossRefGoogle Scholar
Mehta, D., “State-of-the-art reinforcement learning algorithms,” Int. J. Eng. Res. Technol. 8(12), (2019), 717–722.Google Scholar
Sutton, R. S. and Barto, A. G., Reinforcement learning: An introduction, 352 (2015), 113138.Google Scholar
Miljković, Z., Mitić, M., Lazarević, M. and Babić, B., “Neural network reinforcement learning for visual control of robot manipulators,” J. Expert Syst. Appl. 40(5), 17211736 (2013). doi: 10.1016/j.eswa.2012.09.010.CrossRefGoogle Scholar
Anis, K., Hachemi, B., Imen, Ch., Sahar, T., Adel, A., Mohamed-Foued, S., Maram, A., Omar, Ch. and Yasir, J., “Introduction to Mobile Robot Path Planning,” In: Robot Path Planning and Cooperation: Foundations, Algorithms and Experimentation, (Springer International Publishing, Saudi Arabia 2018). doi: 10.1007/978-3-319-77042-0.Google Scholar
Masehian, E. and Amin-Naseri, M., “A voronoi diagram-visibility graph potential field compound algorithm for robot path planning,” J. Robot. Syst. 21(6), 275300 (2004). doi: 10.1002/rob.20014.CrossRefGoogle Scholar
Zhang, L. and Li, Y., “Mobile Robot Path Planning Algorithm Based on Improved A Star,” In: Journal of Physics: Conference Series, Volume 1848, 2021 4th International Conference on Advanced Algorithms and Control Engineering (ICAACE 2021), Sanya, China (January 29-31 2021).CrossRefGoogle Scholar
Fusic, S. J., Ramkumar, P. and Hariharan, K., “Path Planning of Robot Using Modified Dijkstra Algorithm,” In: 2018 National Power Engineering Conference (NPEC), Madurai, India (2018) pp. 15. doi: 10.1109/NPEC.2018.8476787.CrossRefGoogle Scholar
Dönmez, E., Kocamaz, A. F. and Dirik, M., “Bi-RRT Path Extraction and Curve Fitting Smooth with Visual Based Configuration Space Mapping,” In: International Artificial Intelligence and Data Processing Symposium (IDAP), Malatya, Turkey (2017) pp. 15, doi: 10.1109/IDAP.2017.8090214.CrossRefGoogle Scholar
Choueiry, S., Owayjan, M., Diab, H. and Achkar, R., “Mobile Robot Path Planning Using Genetic Algorithm in a Static Environment,” In: 2019 Fourth International Conference on Advances in Computational Tools for Engineering Applications (ACTEA), Beirut, Lebanon (2019) pp. 16. doi: 10.1109/ACTEA.2019.8851100.CrossRefGoogle Scholar
Hosseininejad, S. and Dadkhah, C., “Mobile robot path planning in dynamic environment based on cuckoo optimization algorithm,” Int. J. Adv. Robot. Syst. 16(2), 172988141983957 (2019). doi: 10.1177/1729881419839575.CrossRefGoogle Scholar
Bae, H., Kim, G., Kim, J., Qian, D. and Lee, S., “Multi-robot path planning method using reinforcement learning,” Appl. Sci. 9(15), 3057 (2019). doi: 10.3390/app9153057.CrossRefGoogle Scholar
Pang, B., Song, Y., Zhang, C. and Yang, R., “Effect of random walk methods on searching efficiency in swarm robots for area exploration,” Appl. Intell. 51(7), 51895199 (2021). doi: 10.1007/s10489-020-02060-0.CrossRefGoogle Scholar
Pan, L., Cai, Q., Meng, Q., Chen, W. and L, “Reinforcement Learning with Dynamic Boltzmann Softmax Updates,” In: IJCAI’20: Proceedings of theTwenty-Ninth International Joint Conference on Artificial Intelligence, Yokohama, Japan (2021) pp. 1992–1998. doi: 10.24963/ijcai.2020/272.CrossRefGoogle Scholar
Mahajan, A. and Teneketzis, D., “Multi-Armed Bandit Problems,” In: Foundations and Applications of Sensor Management, Hero, A. O., Castañón, D. A., Cochran, D. and Kastella, K., (Springer, Boston, MA, 2008). doi: 10.1007/978-0-387-49819-5.Google Scholar
Asadi, K. and Littman, M. L.. “An Alternative Softmax Operator for Reinforcement Learning,” In: Proceedings of the 34th International Conference on Machine Learning, PMLR, Sydney, Australia(2017) pp. 243252.Google Scholar
François-Lavet, V., Fonteneau, R. and Ernst, D., “How to Discount Deep Reinforcement Learning: Towards New Dynamic Strategies,” In: Cornell University, NIPS 2015 Deep Reinforcement Learning Workshop (2016). doi: 10.48550/arXiv.1512.02011.CrossRefGoogle Scholar
Brownlee, J., How to Configure the Learning Rate When Training Deep Learning Neural Networks, In: Deep Learning Performance, Machine Learning Mastery, (2019).Google Scholar
Kim, C. H., Watanabe, K., Nishide, S. and Gouko, M., “Epsilon-Greedy Babbling,” In: 2017 Joint IEEE International Conference on Development and Learning and Epigenetic Robotics (ICDL-EpiRob), Lisbon, Portugal (2017) pp. 227232. doi: 10.1109/DEVLRN.2017.8329812.CrossRefGoogle Scholar
Sichkar, V. N., “Reinforcement Learning Algorithms in Global Path Planning for Mobile Robot,” In: 2019 International Conference on Industrial Engineering, Applications and Manufacturing (ICIEAM), Sochi, Russia (2019) pp. 15. doi: 10.1109/ICIEAM.2019.8742915.CrossRefGoogle Scholar
Dönmez, E., Kocamaz, A. F. and Dirik, M., “A vision-based real-time mobile robot controller design based on Gaussian function for indoor environment,” Arab. J. Sci. Eng. 43(12), 71277142 (2018). doi: 10.1007/s13369-017-2917-0.CrossRefGoogle Scholar
Mitić, M. and Miljković, Z., “Neural network learning from demonstration and epipolar geometry for visual control of a nonholonomic mobile robot,” Soft Comput. 18(5), 10111025 (2014). doi: 10.1007/s00500-013-1121-8.CrossRefGoogle Scholar