Hostname: page-component-7bb8b95d7b-l4ctd Total loading time: 0 Render date: 2024-10-05T07:40:12.455Z Has data issue: false hasContentIssue false

A Q-learning approach based on human reasoning for navigation in a dynamic environment

Published online by Cambridge University Press:  30 October 2018

Rupeng Yuan
Affiliation:
State Key Laboratory of Robotics and System, Harbin Institute of Technology, Harbin 150001, China. E-mails: yuanrupeng1991@163.com, yuwang_hit@163.com, meylfu_hit@163.com, wangxy_hit@163.com
Fuhai Zhang*
Affiliation:
State Key Laboratory of Robotics and System, Harbin Institute of Technology, Harbin 150001, China. E-mails: yuanrupeng1991@163.com, yuwang_hit@163.com, meylfu_hit@163.com, wangxy_hit@163.com
Yu Wang
Affiliation:
State Key Laboratory of Robotics and System, Harbin Institute of Technology, Harbin 150001, China. E-mails: yuanrupeng1991@163.com, yuwang_hit@163.com, meylfu_hit@163.com, wangxy_hit@163.com
Yili Fu
Affiliation:
State Key Laboratory of Robotics and System, Harbin Institute of Technology, Harbin 150001, China. E-mails: yuanrupeng1991@163.com, yuwang_hit@163.com, meylfu_hit@163.com, wangxy_hit@163.com
Shuguo Wang
Affiliation:
State Key Laboratory of Robotics and System, Harbin Institute of Technology, Harbin 150001, China. E-mails: yuanrupeng1991@163.com, yuwang_hit@163.com, meylfu_hit@163.com, wangxy_hit@163.com
*
*Corresponding author. E-mail: zfhhit@hit.edu.cn, meylfu_hit@163.com

Summary

A Q-learning approach is often used for navigation in static environments where state space is easy to define. In this paper, a new Q-learning approach is proposed for navigation in dynamic environments by imitating human reasoning. As a model-free method, a Q-learning method does not require the environmental model in advance. The state space and the reward function in the proposed approach are defined according to human perception and evaluation, respectively. Specifically, approximate regions instead of accurate measurements are used to define states. Moreover, due to the limitation of robot dynamics, actions for each state are calculated by introducing a dynamic window that takes robot dynamics into account. The conducted tests show that the obstacle avoidance rate of the proposed approach can reach 90.5% after training, and the robot can always operate below the dynamics limitation.

Type
Articles
Copyright
Copyright © Cambridge University Press 2018 

Access options

Get access to the full version of this content by using one of the access options below. (Log in options will check for institutional or personal access. Content may require purchase if you do not have access.)

References

1. Minguez, J. and Montano, L., “Sensor-based robot motion generation in unknown, dynamic and troublesome scenarios,” Robot. Auton. Syst. 52 (4), 290311 (2005).Google Scholar
2. Xidias, E., Zacharia, P. and Nearchou, A., “Path planning and scheduling for a fleet of autonomous vehicles,” Robotica 34 (10), 22572273 (2016).Google Scholar
3. Zhang, L., “Self-adaptive Monte Carlo localization for mobile robots using range sensors,” Robotica 30 (2), 229244 (2009).Google Scholar
4. Chen, X., Xu, Y., Li, Q., Tang, J. and Shen, C., “Improving ultrasonic-based seamless navigation for indoor mobile robots utilizing EKF and LS-SVM,” Measurement 92, 243251 (2016).Google Scholar
5. Zhuang, Y., Syed, Z., Li, Y. and El-Sheimy, N., “Evaluation of Two WiFi positioning systems based on autonomous crowdsourcing of handheld devices for indoor navigation,” IEEE Trans. Mob. Comput. 15 (8), 19821995 (2016).Google Scholar
6. Cadena, C. et al., “Simultaneous localization and mapping: Present, future, and the robust-perception age,” IEEE Trans. Robot. 30 (6), 13091332 (2016).Google Scholar
7. Hu, X., Chen, L., Tang, B., Cao, D. and He, H., “Dynamic path planning for autonomous driving on various roads with avoidance of static and moving obstacles,” Mech. Syst. Signal Process. 100, 482500 (2018).Google Scholar
8. Khatib, O., “Real-time obstacle avoidance for manipulators and mobile robots,” Int. J. Robot. Res. 5 (5), 500505 (1986).Google Scholar
9. Ge, S. S. and Cui, Y. J., “New potential functions for mobile robot path planning,” IEEE Trans. Robot. Autom. 16 (5), 615620 (2000).Google Scholar
10. Ge, S. S. and Cui, Y. J., “Dynamic motion planning for mobile robots using potential field method,” Auton. Robots 13 (3), 207222 (2002).Google Scholar
11. Chen, Y., Peng, H. and Grizzle, J., “Obstacle avoidance for low-speed autonomous vehicles with barrier function,” IEEE Trans. Control Syst. Technol. 26 (1), 194206 (2018).Google Scholar
12. Lavalle, S., “Rapidly-exploring random trees: A new tool for path planning,” Res. Report 1, 293308 (1998).Google Scholar
13. Richards, Arthur et al., “Spacecraft trajectory planning with avoidance constraints using mixed-integer linear programming,” J. Guidance Control Dynamics 25 (4), 755764 (2012).Google Scholar
14. Yucong, Lin and Saripalli, S., “Path planning using 3D Dubins Curve for Unmanned Aerial Vehicles,” Proceedings of the International Conference on Unmanned Aircraft Systems IEEE, Orlando, FL, USA (2014) pp. 296–304.Google Scholar
15. Duguleana, M. and Mogan, G., “Neural networks based reinforcement learning for mobile robots obstacle avoidance,” Expert Syst. Appl. Int. J. 62, 104115 (2016).Google Scholar
16. Jordan, M. I. and Mitchell, T. M., “Machine learning: Trends, perspectives, and prospects,” Science 349 (6245), 255260 (2015).Google Scholar
17. Tai, L., Li, S. and Liu, M., “A Deep-Network Solution Towards Model-Less Obstacle Avoidance,” Proceedings of the IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), Daejeon, South Korea (Oct. 2016) pp. 2759–2764.Google Scholar
18. Findi, A. H. M., Marhaban, M. H., Kamil, R. and Hassan, M. K., “Collision prediction based genetic network programming-reinforcement learning for mobile robot navigation in unknown dynamic environments,” J. Electr. Eng. Technol. 12, (2017).Google Scholar
19. Watkins, C. J. C. H., “Learning from delayed rewards,” Robot. Auton. Syst. 15 (4), 233235 (1989).Google Scholar
20. Xu, X., Zuo, L. and Huang, Z., “Reinforcement learning algorithms with function approximation: Recent advances and applications,” Information Sci. 261, 131 (2014).Google Scholar
21. Gu, D. and Hu, H., “Teaching robots to plan through Q-learning,” Robotica 23 (2), 139147 (2005).Google Scholar
22. Smart, W. D. and Kaelbling, L. P., “Effective Reinforcement Learning for Mobile Robots,” Proceedings of the IEEE International Conference on Robotics and Automation (ICRA), 4, Washington, DC, USA (May 2002) pp. 3404–3410.Google Scholar
23. Macek, K., Petrovic, I. and Peric, N., “A Reinforcement Learning Approach to Obstacle Avoidance of Mobile Robots,” Proceedings of the International Workshop on Advanced Motion Control, Maribor, Slovenia (2002) pp. 462–466.Google Scholar
24. Lee, J., Kim, T. and Kim, H. J., “Autonomous Lane Keeping based on Approximate Q-learning,” Proceedings of the International Conference on Ubiquitous Robots and Ambient Intelligence (URAI), Jeju, South Korea (July 2017) pp. 402–405.Google Scholar
25. Jaradat, M. A. K., Al-Rousan, M. and Quadan, L., “Reinforcement based mobile robot navigation in dynamic environment,” Robot. Comput.-Integr. Manuf. 27 (1), 135149 (2011).Google Scholar
26. Fox, D., Burgard, W. and Thrun, S., “The dynamic window approach to collision avoidance,” IEEE Robot. Autom. Mag. 4 (1), 2333 (1997).Google Scholar