Document Type : Original/Review Paper

Authors

Electrical Engineering Department, Faculty of Marine Engineering, Chabahar Maritime University, Chabahar, Iran.

Abstract

This paper explores the performance of various object detection techniques for autonomous vehicle perception by analyzing classical machine learning and recent deep learning models. We evaluate three classical methods, including PCA, HOG, and HOG alongside different versions of the SVM classifier, and five deep-learning models, including Faster-RCNN, SSD, YOLOv3, YOLOv5, and YOLOv9 models using the benchmark INRIA dataset. The experimental results show that although classical methods such as HOG + Gaussian SVM outperform other classical approaches, they are outperformed by deep learning techniques. Furthermore, Classical methods have limitations in detecting partially occluded, distant objects and complex clothing challenges, while recent deep-learning models are more efficient and provide better performance (YOLOv9) on these challenges.

Keywords

Main Subjects

[1] T. Barbu, “Pedestrian detection and tracking using temporal differencing and HOG features,” Computers & Electrical Engineering, Vol. 40, No. 4, pp. 1072–1079, 2014, doi:10.1016/j.compeleceng.2013.12.004.
 
[2] A. Bochkovskiy, C.-Y. Wang, and H.-Y. M. Liao, “YOLOv4: Optimal speed and accuracy of object detection,” arXiv preprint arXiv:2004.10934, 2020, doi: https://doi.org/10.48550/arXiv.2004.10934.
 
[3] N. Carion, F. Massa, G. Synnaeve, N. Usunier, A. Kirillov, and S. Zagoruyko, “End-to-end object detection with transformers,” in European Conference on Computer Vision, pp. 213–229. Springer, 2020, doi: 10.1007/978-3-030-58452-8_13.
 
[4] X. Chen, H. Ma, J. Wan, B. Li, and T. Xia, “Multi-view 3D object detection network for autonomous driving,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1907–1915, Honolulu, HI, USA, 2017, doi: 10.1109/CVPR.2017.691.
 
[5] Z. Chen, K. Chen, and J. Chen, “Vehicle and pedestrian detection using support vector machine and histogram of oriented gradient features,” in 2013 International Conference on Computer Sciences and Applications, pp. 365–368, Wuhan, China, IEEE, 2013, doi: 10.1109/CSA.2013.92.
 
[6] C. Cortes and V. Vapnik, “Support-vector networks,” Machine Learning, Vol. 20, pp. 273–297, 1995.
 
[7] N. Dalal and B. Triggs, “Histograms of oriented gradients for human detection,” in 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR’05), pp. 886–893, San Diego, CA, USA, IEEE, 2005, doi: 10.1109/CVPR.2005.177.
 
[8] P. Dollar, C. Wojek, B. Schiele, and P. Perona, “Pedestrian detection: A benchmark,” in 2009 IEEE Conference on Computer Vision and Pattern Recognition, pp. 304–311, Miami, FL, USA, IEEE, 2009, doi: 10.1109/CVPR.2009.5206631.
 
[9] M. Enzweiler and D. M. Gavrila, “Monocular pedestrian detection: Survey and experiments,” IEEE Transactions on Pattern Analysis and Machine Intelligence, Vol. 31, No. 12, pp. 2179–2195, Dec. 2009, doi: 10.1109/TPAMI.2008.260.
 
[10] A. Ess, B. Leibe, and L. Van Gool, “Depth and appearance for mobile scene analysis,” in 2007 IEEE 11th International Conference on Computer Vision, pp. 1–8, Rio de Janeiro, Brazil, IEEE, 2007, doi: 10.1109/ICCV.2007.4409092.
 
[11] M. Everingham, L. Van Gool, C. K. I. Williams, J. Winn, and A. Zisserman, “The Pascal Visual Object Classes (VOC) challenge,” International Journal of Computer Vision, Vol. 88, pp. 303–338, 2010, doi: 10.1007/s11263-009-0275-4.
 
[12] R. Girshick, “Fast R-CNN,” in 2015 IEEE International Conference on Computer Vision (ICCV), pp. 1440–1448, Santiago, Chile, IEEE, 2015, doi: 10.1109/ICCV.2015.169.
 
[13] R. Girshick, J. Donahue, T. Darrell, and J. Malik, “Rich feature hierarchies for accurate object detection and semantic segmentation,” in 2014 IEEE Conference on Computer Vision and Pattern Recognition, pp. 580–587, Columbus, OH, USA, IEEE, 2014, doi:10.1109/CVPR.2014.81.
 
[14] K. He, X. Zhang, S. Ren, and J. Sun, “Spatial pyramid pooling in deep convolutional networks for visual recognition,” IEEE Transactions on Pattern Analysis and Machine Intelligence, Vol. 37, No. 9, pp. 1904–1916, Sept. 2015, doi: 10.1109/TPAMI.2015.2389824.
 
[15] G. E. Hinton, N. Srivastava, A. Krizhevsky, I. Sutskever, and R. R. Salakhutdinov, “Improving neural networks by preventing co-adaptation of feature detectors,” arXiv preprint arXiv:1207.0580, 2012. Available: https://arxiv.org/pdf/1207.0580.
 
[16] S. Ioffe and C. Szegedy, “Batch normalization: Accelerating deep network training by reducing internal covariate shift,” in International Conference on Machine Learning, pp. 448–456, PMLR, 2015.
 
[17] C. Janiesch, P. Zschech, and K. Heinrich, “Machine learning and deep learning,” Electronic Markets, Vol. 31, No. 3, pp. 685–695, 2021, doi: 10.1007/s12525-021-00475-2.
 
[18] R. Kaur and S. Singh, “A comprehensive review of object detection with deep learning,” Digital Signal Processing, Vol. 132, p. 103812, 2023, doi: 10.1016/j.dsp.2022.103812.
 
[19] T. Kobayashi, A. Hidaka, and T. Kurita, “Selection of histograms of oriented gradient features for pedestrian detection,” in Neural Information Processing: 14th International Conference, ICONIP 2007, Kitakyushu, Japan, November 13-16, 2007, Revised Selected Papers, Part II 14, pp. 598–607, Springer, 2008, doi: 10.1007/978-3-540-69162-4_62.
 
[20] K. Lei and Y. Luo, “A new pedestrian detection method based on histogram of oriented gradients and support vector data description,” in Electronics, Communications and Networks, IOS Press, 2024, pp. 333–342.
 
[21] C. Li, L. Li, H. Jiang, K. Weng, Y. Geng, L. Li, Z. Ke, Q. Li, M. Cheng, W. Nie, et al., “YOLOv6: A single-stage object detection framework for industrial applications,” arXiv preprint arXiv:2209.02976, 2022, doi: https://doi.org/10.48550/arXiv.2209.02976.
 
[22] J. Li, X. Liang, S. Shen, T. Xu, J. Feng, and S. Yan, “Scale-aware Fast R-CNN for pedestrian detection,” IEEE Transactions on Multimedia, Vol. 20, No. 4, pp. 985–996, April 2018, doi: 10.1109/TMM.2017.2759508.
 
[23] T.-Y. Lin, M. Maire, S. Belongie, J. Hays, P. Perona, D. Ramanan, P. Dollar, and C. L. Zitnick, “Microsoft COCO: Common objects in context,” in Computer Vision–ECCV 2014: 13th European Conference, Zurich, Switzerland, September 6-12, 2014, Proceedings, Part V 13, pp. 740–755, Springer, 2014.
 
[24] T.-Y. Lin, P. Dollar, R. Girshick, K. He, B. Hariharan, and S. Belongie, “Feature pyramid networks for object detection,” in 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 936–944, Honolulu, HI, USA, IEEE, 2017, doi: 10.1109/CVPR.2017.106.
 
[25] T. Lindeberg, “Scale-invariant feature transform,” 2012. Available: https://www.diva-portal.org/smash/record.jsf?pid=diva2:480321.
 
[26] W. Liu, D. Anguelov, D. Erhan, C. Szegedy, S. Reed, C.-Y. Fu, and A. C. Berg, “SSD: Single shot multibox detector,” in Computer Vision–ECCV 2016: 14th European Conference, Amsterdam, The Netherlands, October 11–14, 2016, Proceedings, Part I 14, pp. 21–37, Springer, 2016.
 
[27] A. Mohan, C. Papageorgiou, and T. Poggio, “Example-based object detection in images by components,” IEEE Transactions on Pattern Analysis and Machine Intelligence, Vol. 23, No. 4, pp. 349–361, April 2001, doi: 10.1109/34.917571.
 
[28] A. Neubeck and L. Van Gool, “Efficient non-maximum suppression,” in 18th International Conference on Pattern Recognition (ICPR'06), pp. 850–855, Hong Kong, China, IEEE, 2006, doi: 10.1109/ICPR.2006.479.
 
[29] C. P. Papageorgiou, M. Oren, and T. Poggio, “A general framework for object detection,” in Sixth International Conference on Computer Vision (IEEE Cat. No.98CH36271), pp. 555–562, Bombay, India, IEEE, 1998, doi: 10.1109/ICCV.1998.710772.
 
[30] J. Redmon and A. Farhadi, “YOLO9000: Better, faster, stronger,” in 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 6517–6525, Honolulu, HI, USA, IEEE, 2017, doi: 10.1109/CVPR.2017.690.
 
[31] J. Redmon and A. Farhadi, “YOLOv3: An incremental improvement,” arXiv preprint arXiv:1804.02767, 2018. Available: https://doi.org/10.48550/arXiv.1804.02767.
 
[32] J. Redmon, S. Divvala, R. Girshick, and A. Farhadi, “You only look once: Unified, real-time object detection,” in 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 779–788, Las Vegas, NV, USA, IEEE, 2016, doi: 10.1109/CVPR.2016.91.
 
[33] S. Ren, K. He, R. Girshick, and J. Sun, “Faster R-CNN: Towards real-time object detection with region proposal networks,” IEEE Transactions on Pattern Analysis and Machine Intelligence, Vol. 39, No. 6, pp. 1137–1149, June 2017, doi: 10.1109/TPAMI.2016.2577031.
 
[34] E. Rublee, V. Rabaud, K. Konolige, and G. Bradski, “ORB: An efficient alternative to SIFT or SURF,” in 2011 International Conference on Computer Vision, pp. 2564–2571, Barcelona, Spain, IEEE, 2011, doi: 10.1109/ICCV.2011.6126544.
 
[35] P. Scovanner, S. Ali, and M. Shah, “A 3-dimensional SIFT descriptor and its application to action recognition,” in Proceedings of the 15th ACM International Conference on Multimedia (MM '07), pp. 357–360, Association for Computing Machinery, New York, NY, USA, 2007, doi: 10.1145/1291233.1291311.
 
[36] C. Shorten and T. M. Khoshgoftaar, “A survey on image data augmentation for deep learning,” Journal of Big Data, Vol. 6, No. 1, pp. 1–48, 2019, doi: 10.1186/s40537-019-0197-0.
 
[37] F. Suard, A. Rakotomamonjy, A. Bensrhair, and A. Broggi, “Pedestrian detection using infrared images and histograms of oriented gradients,” in 2006 IEEE Intelligent Vehicles Symposium, pp. 206–212, Meguro-Ku, Japan, IEEE, 2006, doi: 10.1109/IVS.2006.1689629.
 
[38] F. Suard, A. Rakotomamonjy, A. Bensrhair, and A. Broggi, “Pedestrian detection using infrared images and histograms of oriented gradients,” in 2006 IEEE Intelligent Vehicles Symposium, pp. 206–212, Meguro-Ku, Japan, IEEE, 2006, doi: 10.1109/IVS.2006.1689629.
 
[39] P. Viola, M. Jones, and D. Snow, “Detecting pedestrians using patterns of motion and appearance,” in Proceedings Ninth IEEE International Conference on Computer Vision, pp. 734–741, vol. 2, Nice, France, 2003, doi: 10.1109/ICCV.2003.1238422.
 
[40] P. Viola, M. Jones, and D. Snow, “Detecting pedestrians using patterns of motion and appearance,” in Proceedings Ninth IEEE International Conference on Computer Vision, pp. 734–741, vol. 2, Nice, France, 2003, doi: 10.1109/ICCV.2003.1238422.
 
[41] J. Reis, D. Dillon, J. Kupec, J. Hong, and A. Daoudi, “Real-time flying object detection with YOLOv8,” arXiv preprint arXiv:2305.09972, 2023. Available: https://doi.org/10.48550/arXiv.2305.09972.
 
[42] C.-Y. Wang, I.-H. Yeh, and H.-Y. M. Liao, “You only learn one representation: A unified network for multiple tasks,” arXiv preprint arXiv:2105.04206, 2021. Available: https://doi.org/10.48550/arXiv.2105.04206.
 
[43] C.-Y. Wang, I.-H. Yeh, and H.-Y. M. Liao, “YOLOv9: Learning what you want to learn using programmable gradient information,” arXiv preprint arXiv:2402.13616, 2024. Available: https://doi.org/10.48550/arXiv.2402.13616.
 
[44] T. Watanabe, S. Ito, and K. Yokoi, “Cooccurrence histograms of oriented gradients for pedestrian detection,” in Advances in Image and Video Technology: Third Pacific Rim Symposium, PSIVT 2009, Tokyo, Japan, January 13-16, 2009. Proceedings 3, pp. 37–47, Springer, 2009, doi: 10.1007/978-3-540-92957-4_4.
 
[45] G. Xu, X. Wu, L. Liu, and Z. Wu, “Real-time pedestrian detection based on edge factor and histogram of oriented gradient,” in 2011 IEEE International Conference on Information and Automation, pp. 384–389, Shenzhen, China, IEEE, 2011, doi: 10.1109/ICINFA.2011.5949022.
 
[46] Y. Yamauchi, H. Fujiyoshi, B.-W. Hwang, and T. Kanade, “People detection based on co-occurrence of appearance and spatiotemporal features,” in 2008 19th International Conference on Pattern Recognition, pp. 1–4, Tampa, FL, USA, IEEE, 2008, doi: 10.1109/ICPR.2008.4761809.
 
[47] S. Yao, S. Pan, T. Wang, C. Zheng, W. Shen, and Y. Chong, “A new pedestrian detection method based on combined HOG and LSS features,” Neurocomputing, Vol. 151, pp. 1006–1014, 2015, doi: 10.1016/j.neucom.2014.08.080.
 
[48] W.-L. Zhao and C.-W. Ngo, “Flip-invariant SIFT for copy and object detection,” IEEE Transactions on Image Processing, Vol. 22, No. 3, pp. 980–991, March 2013, doi: 10.1109/TIP.2012.2226043.
 
[49] Z.-Q. Zhao, P. Zheng, S.-T. Xu, and X. Wu, “Object detection with deep learning: A review,” IEEE Transactions on Neural Networks and Learning Systems, Vol. 30, No. 11, pp. 3212–3232, Nov. 2019, doi: 10.1109/TNNLS.2018.2876865.
 
[50] C. Zhou and J. Yuan, “Multi-label learning of part detectors for heavily occluded pedestrian detection,” in 2017 IEEE International Conference on Computer Vision (ICCV), pp. 3506–3515, Venice, Italy, IEEE, 2017, doi: 10.1109/ICCV.2017.377.
 
[51] Q. Zhu, M.-C. Yeh, K.-T. Cheng, and S. Avidan, “Fast human detection using a cascade of histograms of oriented gradients,” in 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06), pp. 1491–1498, New York, NY, USA, IEEE, 2006, doi: 10.1109/CVPR.2006.119.
 
[52] X. Zhu, W. Su, L. Lu, B. Li, X. Wang, and J. Dai, “Deformable DETR: Deformable transformers for end-to-end object detection,” arXiv preprint arXiv:2010.04159, 2020. Available: https://doi.org/10.48550/arXiv.2010.04159.
 
[53] M.-W. Li, R.-Z. Xu, Z.-Y. Yang, W.-C. Hong, X. An, and Y.-H. Yeh, “Optimization approach of berth-quay crane-truck allocation by the tide, environment and uncertainty factors based on chaos quantum adaptive seagull optimization algorithm,” Applied Soft Computing, Vol. 152, p. 111197, 2024, doi: 10.1016/j.asoc.2023.111197.
 
[54] S. Grigorescu, B. Trasnea, T. Cocias, and G. Macesanu, “A survey of deep learning techniques for autonomous driving,” Journal of Field Robotics, Vol. 37, No. 3, pp. 362–386, 2020.
 
[55] V. Bharilya and N. Kumar, “Machine learning for autonomous vehicle's trajectory prediction: A comprehensive survey, challenges, and future research directions,” Vehicular Communications, 2024, p. 100733, doi: 10.1016/j.vehcom.2024.100733.
 
[56] S. M. Ghazali and Y. Baleghi, “Pedestrian detection in infrared outdoor images based on atmospheric situation estimation,” Journal of AI and Data Mining, Vol. 7, No. 1, pp. 1–16, 2019, doi: 10.22044/jadm.2018.5742.1696.
 
[57] M. Nasehi, M. Ashourian, and H. Emami, “Vehicle type, color and speed detection implementation by integrating VGG neural network and YOLO algorithm utilizing Raspberry Pi hardware,” Journal of AI and Data Mining, Vol. 10, No. 4, pp. 579–588, 2022, doi: 10.22044/jadm.2022.11915.2338.