Skip to main content Accessibility help
×
Home
Hostname: page-component-5cfd469876-qxg62 Total loading time: 0.269 Render date: 2021-06-25T08:26:45.907Z Has data issue: true Feature Flags: { "shouldUseShareProductTool": true, "shouldUseHypothesis": true, "isUnsiloEnabled": true, "metricsAbstractViews": false, "figures": true, "newCiteModal": false, "newCitedByModal": true, "newEcommerce": true }

Article contents

A Q-learning approach based on human reasoning for navigation in a dynamic environment

Published online by Cambridge University Press:  30 October 2018

Rupeng Yuan
Affiliation:
State Key Laboratory of Robotics and System, Harbin Institute of Technology, Harbin 150001, China. E-mails: yuanrupeng1991@163.com, yuwang_hit@163.com, meylfu_hit@163.com, wangxy_hit@163.com
Fuhai Zhang
Affiliation:
State Key Laboratory of Robotics and System, Harbin Institute of Technology, Harbin 150001, China. E-mails: yuanrupeng1991@163.com, yuwang_hit@163.com, meylfu_hit@163.com, wangxy_hit@163.com
Yu Wang
Affiliation:
State Key Laboratory of Robotics and System, Harbin Institute of Technology, Harbin 150001, China. E-mails: yuanrupeng1991@163.com, yuwang_hit@163.com, meylfu_hit@163.com, wangxy_hit@163.com
Yili Fu
Affiliation:
State Key Laboratory of Robotics and System, Harbin Institute of Technology, Harbin 150001, China. E-mails: yuanrupeng1991@163.com, yuwang_hit@163.com, meylfu_hit@163.com, wangxy_hit@163.com
Shuguo Wang
Affiliation:
State Key Laboratory of Robotics and System, Harbin Institute of Technology, Harbin 150001, China. E-mails: yuanrupeng1991@163.com, yuwang_hit@163.com, meylfu_hit@163.com, wangxy_hit@163.com

Summary

A Q-learning approach is often used for navigation in static environments where state space is easy to define. In this paper, a new Q-learning approach is proposed for navigation in dynamic environments by imitating human reasoning. As a model-free method, a Q-learning method does not require the environmental model in advance. The state space and the reward function in the proposed approach are defined according to human perception and evaluation, respectively. Specifically, approximate regions instead of accurate measurements are used to define states. Moreover, due to the limitation of robot dynamics, actions for each state are calculated by introducing a dynamic window that takes robot dynamics into account. The conducted tests show that the obstacle avoidance rate of the proposed approach can reach 90.5% after training, and the robot can always operate below the dynamics limitation.

Type
Articles
Copyright
Copyright © Cambridge University Press 2018 

Access options

Get access to the full version of this content by using one of the access options below.

References

1. Minguez, J. and Montano, L., “Sensor-based robot motion generation in unknown, dynamic and troublesome scenarios,” Robot. Auton. Syst. 52 (4), 290311 (2005).CrossRefGoogle Scholar
2. Xidias, E., Zacharia, P. and Nearchou, A., “Path planning and scheduling for a fleet of autonomous vehicles,” Robotica 34 (10), 22572273 (2016).CrossRefGoogle Scholar
3. Zhang, L., “Self-adaptive Monte Carlo localization for mobile robots using range sensors,” Robotica 30 (2), 229244 (2009).CrossRefGoogle Scholar
4. Chen, X., Xu, Y., Li, Q., Tang, J. and Shen, C., “Improving ultrasonic-based seamless navigation for indoor mobile robots utilizing EKF and LS-SVM,” Measurement 92, 243251 (2016).CrossRefGoogle Scholar
5. Zhuang, Y., Syed, Z., Li, Y. and El-Sheimy, N., “Evaluation of Two WiFi positioning systems based on autonomous crowdsourcing of handheld devices for indoor navigation,” IEEE Trans. Mob. Comput. 15 (8), 19821995 (2016).CrossRefGoogle Scholar
6. Cadena, C. et al., “Simultaneous localization and mapping: Present, future, and the robust-perception age,” IEEE Trans. Robot. 30 (6), 13091332 (2016).CrossRefGoogle Scholar
7. Hu, X., Chen, L., Tang, B., Cao, D. and He, H., “Dynamic path planning for autonomous driving on various roads with avoidance of static and moving obstacles,” Mech. Syst. Signal Process. 100, 482500 (2018).CrossRefGoogle Scholar
8. Khatib, O., “Real-time obstacle avoidance for manipulators and mobile robots,” Int. J. Robot. Res. 5 (5), 500505 (1986).CrossRefGoogle Scholar
9. Ge, S. S. and Cui, Y. J., “New potential functions for mobile robot path planning,” IEEE Trans. Robot. Autom. 16 (5), 615620 (2000).CrossRefGoogle Scholar
10. Ge, S. S. and Cui, Y. J., “Dynamic motion planning for mobile robots using potential field method,” Auton. Robots 13 (3), 207222 (2002).CrossRefGoogle Scholar
11. Chen, Y., Peng, H. and Grizzle, J., “Obstacle avoidance for low-speed autonomous vehicles with barrier function,” IEEE Trans. Control Syst. Technol. 26 (1), 194206 (2018).CrossRefGoogle Scholar
12. Lavalle, S., “Rapidly-exploring random trees: A new tool for path planning,” Res. Report 1, 293308 (1998).Google Scholar
13. Richards, Arthur et al., “Spacecraft trajectory planning with avoidance constraints using mixed-integer linear programming,” J. Guidance Control Dynamics 25 (4), 755764 (2012).CrossRefGoogle Scholar
14. Yucong, Lin and Saripalli, S., “Path planning using 3D Dubins Curve for Unmanned Aerial Vehicles,” Proceedings of the International Conference on Unmanned Aircraft Systems IEEE, Orlando, FL, USA (2014) pp. 296–304.Google Scholar
15. Duguleana, M. and Mogan, G., “Neural networks based reinforcement learning for mobile robots obstacle avoidance,” Expert Syst. Appl. Int. J. 62, 104115 (2016).CrossRefGoogle Scholar
16. Jordan, M. I. and Mitchell, T. M., “Machine learning: Trends, perspectives, and prospects,” Science 349 (6245), 255260 (2015).CrossRefGoogle ScholarPubMed
17. Tai, L., Li, S. and Liu, M., “A Deep-Network Solution Towards Model-Less Obstacle Avoidance,” Proceedings of the IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), Daejeon, South Korea (Oct. 2016) pp. 2759–2764.Google Scholar
18. Findi, A. H. M., Marhaban, M. H., Kamil, R. and Hassan, M. K., “Collision prediction based genetic network programming-reinforcement learning for mobile robot navigation in unknown dynamic environments,” J. Electr. Eng. Technol. 12, (2017).CrossRefGoogle Scholar
19. Watkins, C. J. C. H., “Learning from delayed rewards,” Robot. Auton. Syst. 15 (4), 233235 (1989).Google Scholar
20. Xu, X., Zuo, L. and Huang, Z., “Reinforcement learning algorithms with function approximation: Recent advances and applications,” Information Sci. 261, 131 (2014).CrossRefGoogle Scholar
21. Gu, D. and Hu, H., “Teaching robots to plan through Q-learning,” Robotica 23 (2), 139147 (2005).CrossRefGoogle Scholar
22. Smart, W. D. and Kaelbling, L. P., “Effective Reinforcement Learning for Mobile Robots,” Proceedings of the IEEE International Conference on Robotics and Automation (ICRA), 4, Washington, DC, USA (May 2002) pp. 3404–3410.Google Scholar
23. Macek, K., Petrovic, I. and Peric, N., “A Reinforcement Learning Approach to Obstacle Avoidance of Mobile Robots,” Proceedings of the International Workshop on Advanced Motion Control, Maribor, Slovenia (2002) pp. 462–466.Google Scholar
24. Lee, J., Kim, T. and Kim, H. J., “Autonomous Lane Keeping based on Approximate Q-learning,” Proceedings of the International Conference on Ubiquitous Robots and Ambient Intelligence (URAI), Jeju, South Korea (July 2017) pp. 402–405.Google Scholar
25. Jaradat, M. A. K., Al-Rousan, M. and Quadan, L., “Reinforcement based mobile robot navigation in dynamic environment,” Robot. Comput.-Integr. Manuf. 27 (1), 135149 (2011).CrossRefGoogle Scholar
26. Fox, D., Burgard, W. and Thrun, S., “The dynamic window approach to collision avoidance,” IEEE Robot. Autom. Mag. 4 (1), 2333 (1997).CrossRefGoogle Scholar
3
Cited by

Send article to Kindle

To send this article to your Kindle, first ensure no-reply@cambridge.org is added to your Approved Personal Document E-mail List under your Personal Document Settings on the Manage Your Content and Devices page of your Amazon account. Then enter the ‘name’ part of your Kindle email address below. Find out more about sending to your Kindle. Find out more about sending to your Kindle.

Note you can select to send to either the @free.kindle.com or @kindle.com variations. ‘@free.kindle.com’ emails are free but can only be sent to your device when it is connected to wi-fi. ‘@kindle.com’ emails can be delivered even when you are not connected to wi-fi, but note that service fees apply.

Find out more about the Kindle Personal Document Service.

A Q-learning approach based on human reasoning for navigation in a dynamic environment
Available formats
×

Send article to Dropbox

To send this article to your Dropbox account, please select one or more formats and confirm that you agree to abide by our usage policies. If this is the first time you use this feature, you will be asked to authorise Cambridge Core to connect with your <service> account. Find out more about sending content to Dropbox.

A Q-learning approach based on human reasoning for navigation in a dynamic environment
Available formats
×

Send article to Google Drive

To send this article to your Google Drive account, please select one or more formats and confirm that you agree to abide by our usage policies. If this is the first time you use this feature, you will be asked to authorise Cambridge Core to connect with your <service> account. Find out more about sending content to Google Drive.

A Q-learning approach based on human reasoning for navigation in a dynamic environment
Available formats
×
×

Reply to: Submit a response

Please enter your response.

Your details

Please enter a valid email address.

Conflicting interests

Do you have any conflicting interests? *