PLOT: a 3D point cloud object detection network for autonomous driving

Yihuan Zhang; Liang Wang; Yifan Dai

doi:10.1017/S0263574722001837

PLOT: a 3D point cloud object detection network for autonomous driving

Published online by Cambridge University Press: 16 January 2023

Yihuan Zhang

Liang Wang and

Yifan Dai

Show author details

Yihuan Zhang: Affiliation:
Intelligent Connected Vehicle Center, Tsinghua Automotive Research Institute, Suzhou, China
Liang Wang: Affiliation:
Intelligent Connected Vehicle Center, Tsinghua Automotive Research Institute, Suzhou, China
Yifan Dai*: Affiliation:
Intelligent Connected Vehicle Center, Tsinghua Automotive Research Institute, Suzhou, China
*: *Corresponding author. E-mail: daiyifan@tsari.tsinghua.edu.cn

Article contents

Abstract
References

Get access

Rights & Permissions

Abstract

3D object detection using point cloud is an essential task for autonomous driving. With the development of infrastructures, roadside perception can extend the view range of the autonomous vehicles through communication technology. Computation time and power consumption are two main concerns when deploying object detection tasks, and a light-weighted detection model applied in an embedded system is a convenient solution for both roadside and vehicleside. In this study, a 3D Point cLoud Object deTection (PLOT) network is proposed to reduce heavy computing and ensure real-time object detection performance in an embedded system. First, a bird’s eye view representation of the point cloud is calculated using pillar-based encoding method. Then a cross-stage partial network-based backbone and a feature pyramid network-based neck are implemented to generate the high-dimensional feature map. Finally, a multioutput head using a shared convolutional layer is attached to predict classes, bounding boxes, and the orientations of the objects at the same time. Extensive experiments using the Waymo Open Dataset and our own dataset are conducted to demonstrate the accuracy and efficiency of the proposed method.

Keywords

object detection deep neural network 3D point cloud real-time inference

Type: Research Article
Information: Robotica , Volume 41 , Issue 5 , May 2023 , pp. 1483 - 1499

DOI: https://doi.org/10.1017/S0263574722001837 [Opens in a new window]
Copyright: © The Author(s), 2023. Published by Cambridge University Press

Access options

Get access to the full version of this content by using one of the access options below. (Log in options will check for institutional or personal access. Content may require purchase if you do not have access.)

References

Bansal, M., Krizhevsky, A. and Ogale, A., Chauffeurnet: Learning to drive by imitating the best and synthesizing the worst ArXiv preprint arXiv: 1812.03079 (2018).CrossRef Google Scholar

Wang, D., Devin, C., Cai, Q., Krahenbuhl, P. and Darrell, T., Monocular plan view networks for autonomous driving ArXiv preprint arXiv: 1905.06937 (2019).CrossRef Google Scholar

Simonelli, A., Bulo, S. R., Porzi, L., Lopez, M. and Kontschieder, P., “Disentangling Monocular 3D Object Detection,” Proceedings of the IEEE International Conference on Computer Vision (2019) pp. 1991–1999.Google Scholar

Xu, B. and Chen, Z., “Multi-level Fusion based 3D Object Detection from Monocular Images,” Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (2018) pp. 2345–2353.Google Scholar

Li, P., Chen, X. and Shen, S., “Stereo R-CNN based 3D Object Detection for Autonomous Driving,” Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (2019) pp. 7644–7652.Google Scholar

Qin, Z., Wang, J. and Lu, Y., “Triangulation Learning Network: From Monocular to Stereo 3D Object Detection,” Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (2019) pp. 7615–7623.Google Scholar

Zhou, Y. and Tuzel, O., “Voxelnet: End-to-End Learning for Point Cloud based 3D Object Detection,” Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (2018) pp. 4490–4499.Google Scholar

Yan, Y., Mao, Y. and Li, B., “Second: Sparsely embedded convolutional detection,” Sensors 18(10), 3337–3348 (2018).CrossRef Google Scholar PubMed

Leonard, J., How, J., Teller, S., Berger, M., Campbell, S., Fiore, G., Fletcher, L., Frazzoli, E., Huang, A. and Karaman, S., “A perception-driven autonomous urban vehicle,” J. Field Robot. 25(10), 727–774 (2008).CrossRef Google Scholar

Himmelsbach, M., Mueller, A., Luttel, T. and Wunsche, H., “LIDAR-based 3D Object Perception,” Proceedings of 1st International Workshop on Cognition for Technical Systems (2008) pp. 1–10.Google Scholar

Meyer, G. P., Laddha, A., Kee, E., Vallespi-Gonzalez, C. and Wellington, C. K., “Lasernet: An Efficient Probabilistic 3d Object Detector for Autonomous Driving,” In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (2019) pp. 12677–12686.Google Scholar

Shi, S., Wang, X. and Li, H., “Pointrcnn: 3D Object Proposal Generation and Detection From Point Cloud,” Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (2019) pp. 770–779.Google Scholar

Lang, A. H., Vora, S., Caesar, H., Zhou, L., Yang, J. and Beijbom, O., “Pointpillars: Fast Encoders for Object Detection from Point Cloud,” Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (2019) pp. 12697–12705.Google Scholar

Shi, S., Guo, C., Jiang, L., Wang, Z., Shi, J., Wang, X. and Li, H., “PV-RCNN: Point-Voxel Feature Set Abstraction for 3D Object Detection,” Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (2020) pp. 10529–10538.Google Scholar

Qi, C. R., Su, H., Mo, K. and Guibas, L. J., “Pointnet: Deep Learning on Point Sets for 3D Classification and Segmentation,” Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (2017) pp. 652–660.Google Scholar

Zhou, Y., Sun, P., Zhang, Y., Anguelov, D., Gao, J., Ouyang, T. and Vasudevan, V., “End-to-end Multi-View Fusion for 3D Object Detection in Lidar Point Clouds,” Conference on Robot Learning (2020) pp. 923–932.Google Scholar

Wang, C., Liao, H., Wu, Y., Chen, P., Hsieh, J. and Yeh, I., “CSPNet: A New Backbone that Can Enhance Learning Capability of CNN,” Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (2020) pp. 390–391.Google Scholar

Lin, T., Dollár, P., Girshick, R., He, K., Hariharan, B. and Belongie, S., “Feature Pyramid Networks for Object Detection,” Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (2017) pp. 2117–2125.Google Scholar

Arnold, E., Al-Jarrah, O. Y., Dianati, M., Fallah, S., Oxtoby, D. and Mouzakitis, A., “A survey on 3D object detection methods for autonomous driving applications,” IEEE Trans. Intell. Transp. Syst., 3782–3795 (2019).Google Scholar

Girshick, R., Donahue, J., Darrell, T. and Malik, J., “Rich Feature Hierarchies for Accurate Object Detection and Semantic Segmentation,” Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (2014) pp. 580–587.Google Scholar

Girshick, R., “Fast R-CNN,” Proceedings of the IEEE International Conference on Computer Vision (2015) pp. 1440–1448.Google Scholar

Ren, S., He, K., Girshick, R. and Sun, J., “Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks,” Proceedings of the 28th International Conference on Neural Information Processing (2015) pp. 91–99.Google Scholar

He, K., Gkioxari, G., Dollár, P. and Girshick, R., “Mask R-CNN,” Proceedings of the 28th International Conference on Neural Information Processing (2017) pp. 2961–2969.Google Scholar

Dai, J., Qi, H., Xiong, Y., Li, Y., Zhang, G., Hu, H. and Wei, Y., “Deformable Convolutional Networks,” Proceedings of the IEEE International Conference on Computer Vision (2017) pp. 764–773.Google Scholar

Cai, Z. and Vasconcelos, N., “Cascade R-CNN: Delving Into High Quality Object Detection,” Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (2018) pp. 6154–6162.Google Scholar

Liu, W., Anguelov, D., Erhan, D., Szegedy, C., Reed, S., Fu, C. and Berg, A., “SSD: Single shot multibox detector,” Proceedings of the European Conference on Computer Vision (2016) pp. 21–37.Google Scholar

Redmon, J., Divvala, S., Girshick, R. and Farhadi, A., “You Only Look Once: Unified, Real-Time Object Detection,” Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (2016) pp. 779–788.Google Scholar

Redmon, J. and Farhadi, A., “YOLO9000: Better, Faster, Stronger,” Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (2017) pp. 7263–7271.Google Scholar

Lin, T., Goyal, P., Girshick, R., He, K. and Dollár, P., “Focal Loss for Dense Object Detection,” Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (2017) pp. 2980–2988.Google Scholar

Engelcke, M., Rao, D., Wang, D., Tong, C. and Posner, I., “Vote3deep: Fast Object Detection in 3D Point Clouds Using Efficient Convolutional Neural Networks,” Proceedings of the IEEE International Conference on Robotics and Automation (2017) pp. 1355–1361.Google Scholar

Li, B., “3D Fully Convolutional Network for Vehicle Detection in Point Cloud,” Proceedings of the IEEE/RSJ International Conference on Intelligent Robots and Systems (2017) pp. 1513–1518.Google Scholar

Qi, C., Yi, L., Su, H. and Guibas, L., “PointNet++ Deep Hierarchical Feature Learning on Point Sets in a Metric Space,” Proceedings of the 31st International Conference on Neural Information Processing Systems (2017) pp. 5105–5114.Google Scholar

Wang, Y., Sun, Y., Liu, Z., Sarma, S., Bronstein, M. and Solomon, J., “Dynamic graph CNN for learning on point clouds,” ACM Trans. Graphics (TOG) 38(5), 1–12 (2019).Google Scholar

Thomas, H., Qi, C., Deschaud, J., Marcotegui, B., Goulette, F. and Guibas, L., “KPConv: Flexible and Deformable Convolution for Point Clouds,” Proceedings of the IEEE/CVF International Conference on Computer Vision (2019) pp. 6411–6420.Google Scholar

Yang, B., Luo, W. and Urtasun, R., “Pixor: Real-Time 3D Object Detection from Point Clouds,” Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (2018) pp. 7652–7660.Google Scholar

Qi, C., Liu, W., Wu, C., Su, H. and Guibas, L., “Frustum Pointnets for 3D Object Detection From RGB-D Data,” Proceedings of the IEEE conference on Computer Vision and Pattern Recognition (2018) pp. 918–927.Google Scholar

Yin, J., Shen, J., Gao, X., Crandall, D. and Yang, R., “Graph neural network and spatiotemporal transformer attention for 3D video object detection from point clouds,” IEEE Trans. Pattern Anal. Mach. Intell., 1–12 (2021).CrossRef Google Scholar PubMed

Yin, J., Shen, J., Guan, C., Zhou, D. and Yang, R., “LiDAR-Based Online 3D Video Object Detection with Graph-Based Message Passing and Spatiotemporal Transformer Attention,” Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (2020) pp. 11495–11504.Google Scholar

Li, H., Zhao, S., Zhao, W., Zhang, L. and Shen, J., “One-stage anchor-free 3D vehicle detection from LiDAR sensors ,” Sensors 38, 2651–2663 (2021).CrossRef Google Scholar

Meng, Q., Wang, W., Zhou, T., Shen, J., Jia, Y. and Van Gool, L., “Towards a weakly supervised framework for 3D point cloud object detection and annotation,” IEEE Trans. Pattern Anal. Mach. Intell., 1–12 (2021).CrossRef Google Scholar

Meng, Q., Wang, W., Zhou, T., Shen, J., Van Gool, L. and Dai, D., “Weakly Supervised 3D Object Detection From Lidar Point Cloud,” Proceedings of the European Conference on Computer Vision (2020) pp. 515–531.Google Scholar

Wang, C., Bochkovskiy, A. and Liao, H., Scaled-YOLOv4: Scaling Cross Stage Partial Network. Proceedings of the IEEE conference on computer vision and pattern recognition (2021) pp. 13029–13038.Google Scholar

Waymo Open Dataset, “https://waymo.com/open,” Accessed 28 April 2021.Google Scholar

Jetson Xavier NX Developer Kit, “https://developer.nvidia.com/jetson_xavier_nx,” Accessed 28 April 2021.Google Scholar

NVIDIA TensorRT, “https://developer.nvidia.com/tensorrt,” Accessed 28 April 2021.Google Scholar

DeepRoute.ai, “https://www.deeproute.ai,” Accessed 07 Mar 2022.Google Scholar

Tsukada, M., Oi, T., Ito, A., Hirata, M. and Esaki, H., “AutoC2X: Open-Source Software to Realize V2X Cooperative Perception Among Autonomous Vehicles,” Proceedings of the IEEE 92nd Vehicular Technology Conference (2020) pp. 1–6.Google Scholar

Article contents

PLOT: a 3D point cloud object detection network for autonomous driving

Abstract

Keywords

Access options

References

Save article to Kindle

Save article to Dropbox

Save article to Google Drive

Reply to: Submit a response

Your details

You have entered the maximum number of contributors

Conflicting interests