| Sign up

PDF (12.8 MB)

Cite

Collect

Submit Manuscript

Research Article | Open Access

Scale variant vehicle object recognition by CNN module of multi-pooling-PCA process

Yuxiang Guo^¹, Itsuo Kumazawa^¹, Chuyo Kaku^²()

1Department of Information and Communications Engineering, Tokyo Institute of Technology, Tokyo 152-8550, Japan

2Research and Development Center, Jiangsu Chaoli Electric Manufacture Co., Ltd., Shanghai 212321, China

Show Author Information

Abstract

The moving vehicles present different scales in the image due to the perspective effect of different viewpoint distances. The premise of advanced driver assistance system (ADAS) system for safety surveillance and safe driving is early identification of vehicle targets in front of the ego vehicle. The recognition of the same vehicle at different scales requires feature learning with scale invariance. Unlike existing feature vector methods, the normalized PCA eigenvalues calculated from feature maps are used to extract scale-invariant features. This study proposed a convolutional neural network (CNN) structure embedded with the module of multi-pooling-PCA for scale variant object recognition. The validation of the proposed network structure is verified by scale variant vehicle image dataset. Compared with scale invariant network algorithms of Scale-invariant feature transform (SIFT) and FSAF as well as miscellaneous networks, the proposed network can achieve the best recognition accuracy tested by the vehicle scale variant dataset. To testify the practicality of this modified network, the testing of public dataset ImageNet is done and the comparable results proved its effectiveness in general purpose of applications.

Keywords

object detection scale invariance spatial pyramid pooling multi-pooling convolutional neural network (CNN)

References

[1]

Ao, D., Li, J., 2022. Subjective assessment for an advanced driver assistance system: A case study in China. J Intell Connect Veh, 5, 112–122.

Crossref Google Scholar

[2]

Bello, I., 2021. LambdaNetworks: Modeling long-range interactions without attention. https://arxiv.org/abs/2102.08602.pdf

[3]

Bila, C., Sivrikaya, F., Khan, M. A., Albayrak, S., 2017. Vehicles of the future: A survey of research on safety issues. IEEE Trans Intell Transp Syst, 18, 1046–1065.

Crossref Google Scholar

[4]

Chen, L. C., Yang, Y., Wang, J., Xu, W., Yuille, A. L., 2016. Attention to scale: Scale-aware semantic image segmentation. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 3640–3649.

Crossref

[5]

Chen, L. C., Papandreou, G., Kokkinos, I., Murphy, K., Yuille, A. L., 2018. DeepLab: Semantic image segmentation with deep convolutional nets, atrous convolution, and fully connected CRFs. IEEE Trans Pattern Anal Mach Intell, 40, 834–848.

Crossref Google Scholar

[6]

Dehghani, M., Djolonga, J., Mustafa, B., Padlewski, P., Heek, J., Gilmer, J., et al., 2023. Scaling vision transformers to 22 billion parameters. In: International Conference on Machine Learning, 7480–7512.

[7]

Eigen, D., Fergus, R., 2015. Predicting depth, surface normals and semantic labels with a common multi-scale convolutional architecture. In: 2015 IEEE International Conference on Computer Vision (ICCV), 2650–2658.

Crossref

[8]

Guo, Y., Kumazawa, I., Kaku, C., 2018. Blind spot obstacle detection from monocular camera images with depth cues extracted by CNN. Automot Innov, 1, 362–373.

Crossref Google Scholar

[9]

He, K., Zhang, X., Ren, S., Sun, J., 2015. Spatial pyramid pooling in deep convolutional networks for visual recognition. IEEE Trans Pattern Anal Mach Intell, 37, 1904–1916.

Crossref Google Scholar

[10]

Hua, J., Wang, J., Peng, H., Yang, J., 2011. A novel edge detection method based on PCA. Int J Adv Comput Technol, 3, 228–238.

Crossref Google Scholar

[11]

International Organization for Standardization (ISO), 2022. ISO: Road vehicles — Safety of the intended functionality, ISO 21448:2022. https://www.iso.org/standard/77490.html

[12]

Jolliffe, I.T., 2002. Principal Component Analysis. New York: Springer-Yerlag, 24.

[13]

Krizhevsky, A., Sutskever, I., Hinton, G. E., 2012. ImageNet classification with deep convolutional neural networks. In: Proceedings of the 25th International Conference on Neural Information Processing Systems, 1097–1105.

[14]

LeCun, Y., Bengio, Y., Hinton, G., 2015. Deep learning. Nature, 521, 436–444.

Crossref Google Scholar

[15]

Li, X., Wang, W., Zhang, Z., Rötting, M., 2018. Effects of feature selection on lane-change maneuver recognition: An analysis of naturalistic driving data. J Intell Connect Veh, 1, 85–98.

Crossref Google Scholar

[16]

Lin, G., Shen, C., Van Den Hengel, A., Reid, I., 2016. Efficient piecewise training of deep structured models for semantic segmentation. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 3194–3203.

Crossref

[17]

Lin, T. Y., Dollár, P., Girshick, R., He, K., Hariharan, B., Belongie, S., 2017. Feature pyramid networks for object detection. In: 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 1, 936–944.

Crossref

[18]

Lindeberg, T., 2012. Scale invariant feature transform. Scholarpedia, 7, 10491.

Crossref Google Scholar

[19]

Liu, L., Ouyang, W., Wang, X., Fieguth, P., Chen, J., Liu, X. et al., 2020. Deep learning for generic object detection: A survey. Int J Comput Vis, 128, 261–318.

Crossref Google Scholar

[20]

Muhammad, K., Ullah, A., Lloret, J., Del Ser, J., de Albuquerque, V. H. C., 2020. Deep learning for safe autonomous driving: Current challenges and future directions. IEEE Trans Intell Transp Syst, 22, 4316–4336.

Crossref Google Scholar

[21]

Pinheiro, P. O., Collobert, R., 2014. Recurrent convolutional neural networks for scene labeling. In: Proceedings of the 31st International Conference on International Conference on Machine Learning, 82−90.

[22]

Simonyan, K., Zisserman, A., 2014. Very deep convolutional networks for large-scale image recognition. https://arxiv.org/abs/1409.1556.pdf

[23]

Tan, M., Le, Q. V., 2019. EfficientNet: Rethinking model scaling for convolutional neural networks. In: Proceedings of the 36th International Conference on Machine Learning ( ICML), 6105−6114.

[24]

World Health Organization (WHO), 2023. Death on the roads. https://extranet.who.int/roadsafety/death-on-the-roads

[25]

Xiao, L., Bahri, Y., Sohl-Dickstein, J., Schoenholz, S., Pennington, J., 2018. Dynamical isometry and a mean field theory of CNNs: How to train 10,000-layer vanilla convolutional neural networks. In: Proceedings of the 35th International Conference on Machine Learning. https://arxiv.org/abs/1806.05393.pdf

[26]

Yohanes, B. W., 2019. Images similarity based on bags of SIFT descriptor and K-means clustering. Tech, 18, 137–146.

Crossref Google Scholar

[27]

Zhang, X., Yang, Y. H., Han, Z., Wang, H., Gao, C., 2013. Object class detection: A survey. ACM Comput Surv, 46, 10.

Crossref Google Scholar

[28]

Zhu, C., He, Y., Savvides, M., 2020. Feature selective anchor-free module for single-shot object detection. In: 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 840–849.

Crossref

[29]

Zhao, H., Shi, J., Qi, X., Wang, X., Jia, J., 2017. Pyramid scene parsing network. In: 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 6230–6239.

Crossref

[30]

Zoph, B., Vasudevan, V., Shlens, J., Le, Q. V., 2018. Learning transferable architectures for scalable image recognition. In: 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, 8697–8710.

Crossref

Journal of Intelligent and Connected Vehicles

Volume 6 Issue 4,
December 2023

Pages 227-236

DOI: 10.26599/JICV.2023.9210017

Cite this article:

Guo Y, Kumazawa I, Kaku C. Scale variant vehicle object recognition by CNN module of multi-pooling-PCA process. Journal of Intelligent and Connected Vehicles, 2023, 6(4): 227-236. https://doi.org/10.26599/JICV.2023.9210017