Condition-invariant and compact visual place description by convolutional autoencoder

Hanjing Ye; Weinan Chen; Jingwen Yu; Li He; Yisheng Guan; Hong Zhang

doi:10.1017/S0263574723000085

Condition-invariant and compact visual place description by convolutional autoencoder

Published online by Cambridge University Press: 15 March 2023

Hanjing Ye

Weinan Chen ,

Jingwen Yu ,

Li He ,

Yisheng Guan and

Hong Zhang

Show author details

Hanjing Ye: Affiliation:
Shenzhen Key Laboratory of Robotics and Computer Vision, Southern University of Science and Technology, Shenzhen, China Department of Electronic and Electrical Engineering, Southern University of Science and Technology, Shenzhen, China
Weinan Chen: Affiliation:
School of Mechanical and Electrical Engineering, Guangdong University of Technology, Guangzhou, China
Jingwen Yu: Affiliation:
Shenzhen Key Laboratory of Robotics and Computer Vision, Southern University of Science and Technology, Shenzhen, China Department of Electronic and Electrical Engineering, Southern University of Science and Technology, Shenzhen, China
Li He: Affiliation:
Shenzhen Key Laboratory of Robotics and Computer Vision, Southern University of Science and Technology, Shenzhen, China Department of Electronic and Electrical Engineering, Southern University of Science and Technology, Shenzhen, China
Yisheng Guan: Affiliation:
School of Mechanical and Electrical Engineering, Guangdong University of Technology, Guangzhou, China
Hong Zhang*: Affiliation:
Shenzhen Key Laboratory of Robotics and Computer Vision, Southern University of Science and Technology, Shenzhen, China Department of Electronic and Electrical Engineering, Southern University of Science and Technology, Shenzhen, China
*: *Corresponding author. E-mail: hzhang@sustech.edu.cn

Article contents

Abstract
Footnotes
References

Get access

Rights & Permissions

Abstract

Visual place recognition (VPR) in condition-varying environments is still an open problem. Popular solutions are convolutional neural network (CNN)-based image descriptors, which have been shown to outperform traditional image descriptors based on hand-crafted visual features. However, there are two drawbacks of current CNN-based descriptors: (a) their high dimension and (b) lack of generalization, leading to low efficiency and poor performance in real robotic applications. In this paper, we propose to use a convolutional autoencoder (CAE) to tackle this problem. We employ a high-level layer of a pre-trained CNN to generate features and train a CAE to map the features to a low-dimensional space to improve the condition invariance property of the descriptor and reduce its dimension at the same time. We verify our method in four challenging real-world datasets involving significant illumination changes, and our method is shown to be superior to the state-of-the-art. The code of our work is publicly available at https://github.com/MedlarTea/CAE-VPR.

Keywords

visual place recognition dimension reduction convolutional autoencoder visual navigation

Type: Research Article
Information: Robotica , Volume 41 , Issue 6 , June 2023 , pp. 1718 - 1732

DOI: https://doi.org/10.1017/S0263574723000085 [Opens in a new window]
Copyright: © The Author(s), 2023. Published by Cambridge University Press

Access options

Get access to the full version of this content by using one of the access options below. (Log in options will check for institutional or personal access. Content may require purchase if you do not have access.)

Footnotes

†

Hanjing Ye and Weinan Chen are co-first-author

Hanjing Ye and Weinan Chen contribute equally to this paper.

References

Lowry, S., Sünderhauf, N., Newman, P., Leonard, J. J., Cox, D., Corke, P. and Milford, M. J., “Visual place recognition: A survey,” IEEE Trans. Robot. 32(1), 1–19 (2015).CrossRef Google Scholar

Alahi, A., Ortiz, R. and Vandergheynst, P.. Freak: Fast Retina Keypoint. In: 2012 IEEE Conference on Computer Vision and Pattern Recognition, IEEE (2012) pp. 510–517.Google Scholar

Bay, H., Ess, A., Tuytelaars, T. and Van Gool, L., “Speeded-up robust features (surf),” Comput. Vis. Image Underst. 110(3), 346–359 (2008).CrossRef Google Scholar

Cheng, J., Wang, C. and Meng, M. Q.-H., “Robust visual localization in dynamic environments based on sparse motion removal,” IEEE Trans. Autom. Sci. Eng. 17(2), 658–669 (2019).CrossRef Google Scholar

Cheng, J., Zhang, H. and Meng, M. Q.-H., “Improving visual localization accuracy in dynamic environments based on dynamic region removal,” IEEE Trans. Autom. Sci. Eng. 17(3), 1585–1596 (2020).CrossRef Google Scholar

Lowe, D. G., “Distinctive image features from scale-invariant keypoints,” Int. J. Comput. Vis. 60(2), 91–110 (2004).CrossRef Google Scholar

Rublee, E., Rabaud, V., Konolige, K. and Bradski, G.. Orb: An Efficient Alternative to Sift or Surf. In: 2011 International Conference on Computer Vision, IEEE (2011) pp. 2564–2571.Google Scholar

Wang, Y.-T. and Lin, G.-Y., “Improvement of speeded-up robust features for robot visual simultaneous localization and mapping,” Robotica 32(4), 533–549 (2014).CrossRef Google Scholar

Girshick, R., Donahue, J., Darrell, T. and Malik, J.. Rich Feature Hierarchies for Accurate Object Detection and Semantic Segmentation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, IEEE (2014) pp. 580–587.Google Scholar

Krizhevsky, A., Sutskever, I. and Hinton, G. E., “Imagenet classification with deep convolutional neural networks,” Adv. Neural Inf. Process. Syst. 25, 1097–1105 (2012).Google Scholar

Ronneberger, O., Fischer, P. and Brox, T.. U-Net: Convolutional Networks for Biomedical Image Segmentation. In: International Conference on Medical Image Computing and Computer-Assisted Intervention, Springer (2015) pp. 234–241.Google Scholar

Sünderhauf, N., Shirazi, S., Dayoub, F., Upcroft, B. and Milford, M.. On the Performance of Convnet Features for Place Recognition. In: 2015 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), IEEE (2015) pp. 4297–4304.Google Scholar

Arandjelovic, R., Gronat, P., Torii, A., Pajdla, T. and Sivic, J.. Netvlad: CNN Architecture for Weakly Supervised Place Recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, IEEE (2016) pp. 5297–5307.Google Scholar

Ge, Y., Wang, H., Zhu, F., Zhao, R. and Li, H.. Self-Supervising Fine-Grained Region Similarities for Large-Scale Image Localization. In: European Conference on Computer Vision, Springer (2020) pp. 369–386.Google Scholar

Gordo, A., Almazán, J., Revaud, J. and Larlus, D.. Deep Image Retrieval: Learning Global Representations for Image Search. In: European Conference on Computer Vision, Springer (2016) pp. 241–257.Google Scholar

Radenović, F., Tolias, G. and Chum, O., “Fine-tuning CNN image retrieval with no human annotation,” IEEE Trans. Pattern Anal. Mach. Intell. 41(7), 1655–1668 (2018).CrossRef Google Scholar PubMed

Jégou, H., Douze, M., Schmid, C. and Pérez, P.. Aggregating Local Descriptors into a Compact Image Representation. In: 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, IEEE (2010) pp. 3304–3311.Google Scholar

Philbin, J., Chum, O., Isard, M., Sivic, J. and Zisserman, A.. Object Retrieval with Large Vocabularies and Fast Spatial Matching. In: 2007 IEEE Conference on Computer Vision and Pattern Recognition, IEEE (2007) pp. 1–8.Google Scholar

Sivic, J. and Zisserman, A.. Video Google: A Text Retrieval Approach to Object Matching in Videos. In: Computer Vision, IEEE International Conference, IEEE Computer Society, 3, (2003) pp. 1470–1470.Google Scholar

Arandjelovic, R. and Zisserman, A.. All About Vlad. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, IEEE (2013) pp. 1578–1585.Google Scholar

Jégou, H., Perronnin, F., Douze, M., Sánchez, J., Pérez, P. and Schmid, C., “Aggregating local image descriptors into compact codes,” IEEE Trans. Pattern Anal. Mach. Intell. 34(9), 1704–1716 (2011).CrossRef Google Scholar

Babenko, A., Slesarev, A., Chigorin, A. and Lempitsky, V.. Neural Codes for Image Retrieval. In: European Conference on Computer Vision, Springer (2014) pp. 584–599.Google Scholar

Razavian, A. S., Azizpour, H., Sullivan, J. and Carlsson, S.. CNN Features Off-the-Shelf: An Astounding Baseline for Recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops, IEEE (2014) pp. 806–813.Google Scholar

Tolias, G., Sicre, R. and Jégou, H.. Particular Object Retrieval with Integral Max-Pooling of CNN Activations. In: ICLR 2016-International Conference on Learning Representations, (2016) pp. 1–12.Google Scholar

Razavian, A. S., Sullivan, J., Carlsson, S. and Maki, A., “Visual instance retrieval with deep convolutional networks,” ITE Trans. Media Technol. Appl. 4(3), 251–258 (2016).CrossRef Google Scholar

Mao, X.-J., Shen, C. and Yang, Y.-B., Image Restoration Using Convolutional Auto-Encoders with Symmetric Skip Connections, arXiv preprint arXiv: 1606.08921, (2016).Google Scholar

Mirza, M. and Osindero, S., Conditional Generative Adversarial Nets, arXiv preprint arXiv: 1411.1784, (2014).Google Scholar

Isola, P., Zhu, J.-Y., Zhou, T. and Efros, A. A.. Image-to-Image Translation with Conditional Adversarial Networks. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, IEEE (2017) pp. 1125–1134.Google Scholar

Vankadari, M., Garg, S., Majumder, A., Kumar, S. and Behera, A.. Unsupervised Monocular Depth Estimation for Night-Time Images Using Adversarial Domain Feature Adaptation. In: European Conference on Computer Vision, Springer (2020) pp. 443–459.Google Scholar

Merrill, N. and Huang, G.. Lightweight Unsupervised Deep Loop Closure. In: Proceedings of Robotics: Science and Systems (RSS), Pittsburgh, PA (2018).Google Scholar

Dai, Z., Huang, X., Chen, W., Chen, C., He, L., Wen, S. and Zhang, H.. Keypoint Description by Descriptor Fusion Using Autoencoders. In: 2020 IEEE International Conference on Robotics and Automation (ICRA), IEEE (2020) pp. 65–71.Google Scholar

Ba, J. L., Kiros, J. R. and Hinton, G. E., Layer Normalization, arXiv preprint arXiv: 1607.06450, (2016).Google Scholar

Hou, Y., Zhang, H. and Zhou, S.. Convolutional Neural Network-Based Image Representation for Visual Loop Closure Detection. In: 2015 IEEE International Conference on Information and Automation, IEEE (2015) pp. 2238–2245.Google Scholar

Simonyan, K. and Zisserman, A., “Very Deep Convolutional Networks for Large-Scale Image Recognition,” In: 3rd International Conference on Learning Representations, ICLR 2015, Conference Track Proceedings, (Bengio, Y. and LeCun, Y., eds.) San Diego, CA, USA, (May 7-9, 2015).Google Scholar

Chen, Z., Maffra, F., Sa, I. and Chli, M.. Only Look Once, Mining Distinctive Landmarks from Convnet for Visual Place Recognition. In: 2017 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), IEEE (2017) pp. 9–16.Google Scholar

Ioffe, S. and Szegedy, C.. Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift. In: International Conference on Machine Learning, PMLR (2015) pp. 448–456.Google Scholar

He, K., Zhang, X., Ren, S. and Sun, J.. Delving Deep Into Rectifiers: Surpassing Human-Level Performance on Imagenet Classification. In: Proceedings of the IEEE International Conference on Computer Vision, IEEE (2015) pp. 1026–1034.Google Scholar

Maddern, W., Pascoe, G., Linegar, C. and Newman, P., “1 year, 1000 km: The oxford robotcar dataset,” Int. J. Robot. Res. 36(1), 3–15 (2017).CrossRef Google Scholar

Olid, D., Fácil, J. M. and Civera, J., Single-View Place Recognition Under Seasonal Changes. In: PPNIV Workshop at IROS 2018, (2018).Google Scholar

Liu, Y., Feng, R. and Zhang, H.. Keypoint Matching by Outlier Pruning with Consensus Constraint. In: 2015 IEEE International Conference on Robotics and Automation (ICRA), IEEE (2015) pp. 5481–5486.Google Scholar

Article contents

Condition-invariant and compact visual place description by convolutional autoencoder

Abstract

Keywords

Access options

Footnotes

References

Save article to Kindle

Save article to Dropbox

Save article to Google Drive

Reply to: Submit a response

Your details

You have entered the maximum number of contributors

Conflicting interests