Long-term object search using incremental scene graph updating

Fangbo Zhou; Huaping Liu; Huailin Zhao; Lanjun Liang

doi:10.1017/S0263574722001205

Long-term object search using incremental scene graph updating

Published online by Cambridge University Press: 22 August 2022

Huailin Zhao and

Fangbo Zhou: Affiliation:
School of Electrical and Electronic Engineering, Shanghai Institute of Technology, Shanghai, China
Huaping Liu*: Affiliation:
Department of Computer Science and Technology, Tsinghua University, Beijing, China.
Huailin Zhao: Affiliation:
School of Electrical and Electronic Engineering, Shanghai Institute of Technology, Shanghai, China
Lanjun Liang: Affiliation:
School of Electrical and Electronic Engineering, Shanghai Institute of Technology, Shanghai, China
*: *Corresponding author. E-mail: hpliu@tsinghua.edu.cn.

Article contents

Abstract
Footnotes
References

Get access

Rights & Permissions

Abstract

Effective searching for target objects in indoor scenes is essential for household robots to perform daily tasks. With the establishment of a precise map, the robot can navigate to a fixed static target. However, it is difficult for mobile robots to find movable objects like cups. To address this problem, we establish an object search framework that combines navigation map, semantic map, and scene graph. The robot updates the scene graph to achieve a long-term target search. Considering the different start positions of the robots, we weigh the distance the robot walks and the probability of finding objects to achieve global path planning. The robot can continuously update the scene graph in a dynamic environment to memorize the position relation of objects in the scene. This method has been realized in both simulation and real-world environments. The experimental results show the feasibility and effectiveness of this method.

Keywords

household robot long-term object search incremental scene graph

Type: Research Article
Information: Robotica , Volume 41 , Issue 3 , March 2023 , pp. 962 - 975

DOI: https://doi.org/10.1017/S0263574722001205 [Opens in a new window]
Copyright: © The Author(s), 2022. Published by Cambridge University Press

Access options

Get access to the full version of this content by using one of the access options below. (Log in options will check for institutional or personal access. Content may require purchase if you do not have access.)

Footnotes

†

This work was completed while Fangbo Zhou was visiting Tsinghua University, Beijing, China.

References

Hess, W., Kohler, D., Rapp, H. and Andor, D., “Real-time Loop Closure in 2d Lidar Slam,” In: 2016 IEEE International Conference on Robotics and Automation (ICRA) (IEEE, 2016) pp. 1271–1278.CrossRef Google Scholar

Kenye, L. and Kala, R., “Improving RGB-D slam in dynamic environments using semantic aided segmentation,” Robotica 40(6), 2065–2090 (2021).CrossRef Google Scholar

Handa, A., Whelan, T., McDonald, J. and Davison, A. J., “A Benchmark for Rgb-d Visual Odometry, 3D Reconstruction and Slam,” In: 2014 IEEE International Conference on Robotics and Automation (IEEE, 2014) pp. 1524–1531.CrossRef Google Scholar

Fuentes-Pacheco, J., Ruiz-Ascencio, J.é and Rendón-Mancha, J. M., “Visual simultaneous localization and mapping: A survey,” Artif. Intell. Rev. 43(1), 55–81 (2015).CrossRef Google Scholar

Emrah Dnmez, A. F. K. and Dirik, M., “A vision-based real-time mobile robot controller design based on Gaussian function for indoor environment,” Arab. J. Sci. Eng. 4, 1–16 (2017).Google Scholar

Dnmez, E. and Kocamaz, A. F., “Design of mobile robot control infrastructure based on decision trees and adaptive potential area methods,” Iran. J. Sci. Technol. Trans. Electr. Eng. 44(2), 431–448 (2019).CrossRef Google Scholar

Wei, Y., Zhang, K., Wu, D. and Hu, Z., “Exploring conventional enhancement and separation methods for multi-speech enhancement in indoor environments,” Cognit. Comput. Syst. 3(4), 307–322 (2021).CrossRef Google Scholar

Masutani, Y., Mikawa, M., Maru, N. and Miyazaki, F., “Visual Servoing for Non-Holonomic Mobile Robots,” In: IEEE/RSJ/GI International Conference on Intelligent Robots & Systems 94 Advanced Robotic Systems & the Real World (2002).Google Scholar

Okumu, F., Dnmez, E. and Kocamaz, A. F., “A cloudware architecture for collaboration of multiple agvs in indoor logistics: Case study in fabric manufacturing enterprises,” Electronics 9(12), 2023–2047 (2020).CrossRef Google Scholar

Pandey, K. K. and Parhi, D. R., “Trajectory planning and the target search by the mobile robot in an environment using a behavior-based neural network approach,” Robotica 38(9), 1627–1641 (2020).CrossRef Google Scholar

Du, H., Yu, X. and Zheng, L., “Learning Object Relation Graph and Tentative Policy for Visual Navigation,” In: European Conference on Computer Vision (Springer, 2020) pp. 19–34.CrossRef Google Scholar

Druon, R., Yoshiyasu, Y., Kanezaki, A. and Watt, A., “Visual object search by learning spatial context,” IEEE Robot. Automat. Lett. 5(2), 1279–1286 (2020).CrossRef Google Scholar

Qiu, Y., Pal, A. and Christensen, H. I., “Learning Hierarchical Relationships for Object-Goal Navigation,” In: 2020 Conference on Robot Learning (CoRL) (2020).Google Scholar

Yang, W., Wang, X., Farhadi, A., Gupta, A. and Mottaghi, R., Visual semantic navigation using scene priors. arXiv preprint arXiv: 1810. 06543, 2018.Google Scholar

Wortsman, M., Ehsani, K., Rastegari, M., Farhadi, A. and Mottaghi, R., “Learning to Learn How to Learn: Self-adaptive Visual Navigation Using Meta-Learning,” In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (2019) pp. 6750–6759.Google Scholar

DeSouza, G. N. and Kak, A. C., “Vision for mobile robot navigation: A survey,” IEEE Trans. Patt. Anal. 24(2), 237–267 (2002).CrossRef Google Scholar

Hart, P. E., Nilsson, N. J. and Raphael, B., “A formal basis for the heuristic determination of minimum cost paths,” IEEE Trans. Syst. Sci. Cybern. 4(2), 100–107 (1968).CrossRef Google Scholar

Karaman, S. and Frazzoli, E., “Sampling-based algorithms for optimal motion planning,” Int. J. Robot. Res. 30(7), 846–894 (2011).CrossRef Google Scholar

Kattepur, A. and Purushotaman, B., “Roboplanner: A pragmatic task planning framework for autonomous robots,” Cognit. Comput. Syst. 2(1), 12–22 (2020).CrossRef Google Scholar

Krichmar, J. L., Hwu, T., Zou, X. and Hylton, T., “Advantage of prediction and mental imagery for goal-directed behaviour in agents and robots,” Cognit. Comput. Syst. 1(1), 12–19 (2019).CrossRef Google Scholar

Liang, Y., Chen, B. and Song, S., Sscnav: Confidence-aware semantic scene completion for visual semantic navigation, arXiv preprint arXiv:2012.04512 (2020).CrossRef Google Scholar

Chaplot, D. S., Gandhi, D. P., Gupta, A. and Salakhutdinov, R. R., “Object goal navigation using goal-oriented semantic exploration,” Adv. Neur. Inform. Process. Syst. 33, 4247–4258 (2020).Google Scholar

Tan, S., Di, G., Liu, H., Zhang, X. and Sun, F., “Embodied scene description,” Auton. Robot. 46(1), 21–43 (2022).CrossRef Google Scholar

Savva, M., Kadian, A., Maksymets, O., Zhao, Y., Wijmans, E., Jain, B., Straub, J., Liu, J., Koltun, V., Malik, J., Parikh, D. and Batra, D., “Habitat: A Platform for Embodied Ai Research,” In: Proceedings of the IEEE/CVF International Conference on Computer Vision (2019) pp. 9339–9347.Google Scholar

Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A. and Zhang, Y., Matterport3d: learning from RGB-D data in indoor environments. arXiv preprint arXiv: 1709. 06158 (2017).CrossRef Google Scholar

Cartillier, V., Ren, Z., Jain, N., Lee, S., Essa, I. and Batra, D., Semantic mapnet: building allocentric semanticmaps and representations from egocentric views, arXiv preprint arXiv:2010.01191 (2020).CrossRef Google Scholar

Liu, X., Di, G., Liu, H. and Sun, F., “Multi-agent embodied visual semantic navigation with scene prior knowledge,” IEEE Robot. Automat. Lett. 7(2), 3154–3161 (2022).CrossRef Google Scholar

Xinzhu, L., Xinghang, L., Di, G., Huaping, L. and Fuchun, S., Embodied multi-agent task planning from ambiguous instruction (2022).Google Scholar

Li, X., Liu, H., Zhou, J. and Sun, F. C., “Learning cross-modal visual-tactile representation using ensembled generative adversarial networks,” Cognit. Comput. Syst. 1(2), 40–44 (2019).CrossRef Google Scholar

Tan, S., Xiang, W., Liu, H., Di, G. and Sun, F., “Multi-agent Embodied Question Answering in Interactive Environments,” In: European Conference on Computer Vision (Springer, 2020) pp. 663–678.CrossRef Google Scholar

Zhu, Y., Mottaghi, R., Kolve, E., Lim, J. J., Gupta, A., Fei-Fei, L. and Farhadi, A., “Target-Driven Visual Navigation in Indoor Scenes Using Deep Reinforcement Learning,” In: 2017 IEEE International Conference on Robotics and Automation (ICRA) (IEEE, 2017) pp. 3357–3364.CrossRef Google Scholar

Mousavian, A., Toshev, A., Fišer, M., Košecká, J., Wahid, A. and Davidson, J., “Visual Representations for Semantic Target Driven Navigation,” In: 2019 International Conference on Robotics and Automation (ICRA) (IEEE, 2019) pp. 8846–8852.CrossRef Google Scholar

Redmon, J. and Farhadi, A., Yolov3: An incremental improvement. arXiv preprint arXiv: 1804. 02767 (2018).Google Scholar

Johnson, J., Krishna, R., Stark, M., Li, L.-J., Shamma, D., Bernstein, M. and Fei-Fei, L., “Image Retrieval Using Scene Graphs,” In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (2015) pp. 3668–3678.Google Scholar

Lenser, S. and Veloso, M., “Visual Sonar: Fast Obstacle Avoidance Using Monocular Vision,” In: Proceedings 2003 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS 2003) (Cat. No. 03CH37453), vol. 1, (IEEE,2003) pp. 886–891.Google Scholar

Li, X., Di, G., Liu, H. and Sun, F., “Embodied Semantic Scene Graph Generation,” In: Conference on Robot Learning (PMLR, 2022) pp. 1585–1594.Google Scholar

Zhang, H., Kyaw, Z., Chang, S.-F. and Chua, T.-S., “Visual Translation Embedding Network for Visual Relation Detection,” In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (2017) pp. 5532–5540.Google Scholar

Wu, Q., Shen, C., Wang, P., Dick, A. and Van Den Hengel, A., “Image captioning and visual question answering based on attributes and external knowledge,” IEEE Trans. Patt. Anal. 40(6), 1367–1381 (2017).CrossRef Google Scholar PubMed

Zeng, Z., Röfer, A. and Jenkins, O. C., “Semantic Linking Maps for Active Visual Object Search,” In: 2020 IEEE International Conference on Robotics and Automation (ICRA) (IEEE, 2020) pp. 1984–1990.CrossRef Google Scholar

Meyer-Delius, D., Hess, J. M., Grisetti, G. and Burgard, W., “Temporary Maps for Robust Localization in Semi-Static Environments,” In: 2010 IEEE/RSJ International Conference on Intelligent Robots and Systems (2010).CrossRef Google Scholar

Krishna, R., Zhu, Y., Groth, O., Johnson, J., Hata, K., Kravitz, J., Chen, S., Kalantidis, Y., Li, L.-J., Shamma, D. A., Bernstein, M. S. and Fei-Fei, L., “Visual genome: Connecting language and vision using crowdsourced dense image annotations,” Int. J. Comput. Vis. 123(1), 32–73 (2017).CrossRef Google Scholar

He, K., Gkioxari, G., Dollár, P. and Girshick, R., “Mask R-CNN,” In: Proceedings of the IEEE International Conference on Computer Vision (2017) pp. 2961–2969.Google Scholar

Kolve, E., Mottaghi, R., Han, W., VanderBilt, E., Weihs, L., Herrasti, A., Gordon, D., Zhu, Y., Gupta, A., Farhadi, A., Ai2-thor: an interactive 3D environment for visual AI. arXiv preprint arXiv:1712.05474 (2017).Google Scholar

Gan, C., Zhang, Y., Wu, J., Gong, B. and Tenenbaum, J. B., “Look, Listen, and Act: Towards Audio-Visual Embodied Navigation,” In: 2020 IEEE International Conference on Robotics and Automation (ICRA) (IEEE, 2020) pp. 9701–9707.CrossRef Google Scholar

Article contents

Long-term object search using incremental scene graph updating

Abstract

Keywords

Access options

Footnotes

References

Save article to Kindle

Save article to Dropbox

Save article to Google Drive

Reply to: Submit a response

Your details

You have entered the maximum number of contributors

Conflicting interests