AI Chat Paper
Note: Please note that the following content is generated by AMiner AI. SciOpen does not take any responsibility related to this content.
{{lang === 'zh_CN' ? '文章概述' : 'Summary'}}
{{lang === 'en_US' ? '中' : 'Eng'}}
Chat more with AI
Article Link
Collect
Submit Manuscript
Show Outline
Outline
Show full outline
Hide outline
Outline
Show full outline
Hide outline
Regular Paper

Knowledge Distillation via Hierarchical Matching for Small Object Detection

School of Computer Science and Artificial Intelligence, Zhengzhou University, Zhengzhou 450000, China
Show Author Information

Abstract

Knowledge distillation is often used for model compression and has achieved a great breakthrough in image classification, but there still remains scope for improvement in object detection, especially for knowledge extraction of small objects. The main problem is the features of small objects are often polluted by background noise and not prominent due to down-sampling of convolutional neural network (CNN), resulting in the insufficient refinement of small object features during distillation. In this paper, we propose Hierarchical Matching Knowledge Distillation Network (HMKD) that operates on the pyramid level P2 to pyramid level P4 of the feature pyramid network (FPN), aiming to intervene on small object features before affecting. We employ an encoder-decoder network to encapsulate low-resolution, highly semantic information, akin to eliciting insights from profound strata within a teacher network, and then match the encapsulated information with high-resolution feature values of small objects from shallow layers as the key. During this period, we use an attention mechanism to measure the relevance of the inquiry to the feature values. Also in the process of decoding, knowledge is distilled to the student. In addition, we introduce a supplementary distillation module to mitigate the effects of background noise. Experiments show that our method achieves excellent improvements for both one-stage and two-stage object detectors. Specifically, applying the proposed method on Faster R-CNN achieves 41.7% mAP on COCO2017 (ResNet50 as the backbone), which is 3.8% higher than that of the baseline.

Electronic Supplementary Material

Download File(s)
JCST-2401-14158-Highlights.pdf (149.3 KB)

References

[1]

Cao C, Wang B, Zhang W, Zeng X, Yan X, Feng Z, Liu Y, Wu Z. An improved faster R-CNN for small object detection. IEEE Access, 2019, 7: 106838–106846. DOI: 10.1109/ACCESS.2019.2932731.

[2]
Yang C, Huang Z, Wang N. QueryDet: Cascaded sparse query for accelerating high-resolution small object detection. In Proc. the 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition, Jun. 2022, pp.13658–13667. DOI: 10.1109/CVPR52688.2022.01330.
[3]
Hubara I, Courbariaux M, Soudry D, El-Yaniv R, Bengio Y. Binarized neural networks. In Proc. the 30th International Conference on Neural Information Processing Systems, Dec. 2016, pp.4114–4122.
[4]
Rastegari M, Ordonez V, Redmon J, Farhadi A. XNOR-Net: ImageNet classification using binary convolutional neural networks. In Proc. the 14th European Conference on Computer Vision, Oct. 2016, pp.525–542. DOI: 10.1007/978-3-319-46493-0_32.
[5]
Han S, Pool J, Tran J, Dally W J. Learning both weights and connections for efficient neural network. In Proc. the 28th International Conference on Neural Information Processing Systems, Dec. 2015, pp.1135–1143.
[6]
He Y, Zhang X, Sun J. Channel pruning for accelerating very deep neural networks. In Proc. the 2017 IEEE International Conference on Computer Vision, Oct. 2017, pp.1398–1406. DOI: 10.1109/ICCV.2017.155.
[7]
Hinton G, Vinyals O, Dean J. Distilling the knowledge in a neural network. arXiv: 1503.02531, 2015. https://arxiv.org/abs/1503.02531, Jul. 2024.
[8]
Ji M, Heo B, Park S. Show, attend and distill: Knowledge distillation via attention-based feature matching. In Proc. the 35th AAAI Conference on Artificial Intelligence, Feb. 2021, pp.7945–7952. DOI: 10.1609/aaai.v35i9.16969.
[9]
Wang T, Yuan L, Zhang X, Feng J. Distilling object detectors with fine-grained feature imitation. In Proc. the 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition, Jun. 2019, pp.4928–4937. DOI: 10.1109/CVPR.2019.00507.
[10]
Zhang L, Ma K. Improve object detection with feature-based knowledge distillation: Towards accurate and efficient detectors. In Proc. the 9th International Conference on Learning Representations, May 2021.
[11]
Heo B, Kim J, Yun S, Park H, Kwak N, Choi J Y. A comprehensive overhaul of feature distillation. In Proc. the 2019 IEEE/CVF International Conference on Computer Vision, Oct. 27-Nov. 2, 2019, pp.1921–1930. DOI: 10.1109/ICCV.2019.00201.
[12]
Ren S, He K, Girshick R, Sun J. Faster R-CNN: Towards real-time object detection with region proposal networks. In Proc. the 28th International Conference on Neural Information Processing Systems, Dec. 2015, pp.91–99.
[13]
Lin T Y, Dollár P, Girshick R, He K, Hariharan B, Belongie S. Feature pyramid networks for object detection. In Proc. the 2017 IEEE conference on Computer Vision and Pattern Recognition, Jul. 2017, pp.936–944. DOI: 10.1109/CVPR.2017.106.
[14]
Kang Z, Zhang P, Zhang X, Sun J, Zheng N. Instance-conditional knowledge distillation for object detection. In Proc. the 35th International Conference on Neural Information Processing Systems, Dec. 2021, Article No. 1259.
[15]
Selvaraju R R, Cogswell M, Das A, Vedantam R, Parikh D, Batra D. Grad-CAM: Visual explanations from deep networks via gradient-based localization. In Proc. the 2017 IEEE International Conference on Computer Vision, Oct. 2017, pp.618–626. DOI: 10.1109/ICCV.2017.74.
[16]
Cao Y, Xu J, Lin S, Wei F, Hu H. GCNet: Non-local networks meet squeeze-excitation networks and beyond. In Proc. the 2019 IEEE/CVF International Conference on Computer Vision workshop, Oct. 2019, pp.1971–1980. DOI: 10.1109/ICCVW.2019.00246.
[17]
Yang Z, Li Z, Jiang X, Gong Y, Yuan Z, Zhao D, Yuan C. Focal and global knowledge distillation for detectors. In Proc. the 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition, Jun. 2022, pp.4633–4642. DOI: 10.1109/CVPR52688.2022.00460.
[18]
Chen G, Choi W, Yu X, Han T, Chandraker M. Learning efficient object detection models with knowledge distillation. In Proc. the 31st International Conference on Neural Information Processing Systems, Dec. 2017, pp.742–751.
[19]
Guo J, Han K, Wang Y, Wu H, Chen X, Xu C, Xu C. Distilling object detectors via decoupled features. In Proc. the 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition, Jun. 2021, pp.2154–2164. DOI: 10.1109/CVPR46437.2021.00219.
[20]
Chang J, Wang S, Xu H M, Chen Z, Yang C, Zhao F. DETRDistill: A universal knowledge distillation framework for DETR-families. In Proc. the 2023 IEEE/CVF International Conference on Computer Vision, Oct. 2023, pp.6875–6885. DOI: 10.1109/ICCV51070.2023.00635.
[21]

Lin T Y, Goyal P, Girshick R, He K, Dollár P. Focal loss for dense object detection. IEEE Trans. Pattern Analysis and Machine Intelligence, 2020, 42(2): 318–327. DOI: 10.1109/TPAMI.2018.2858826.

[22]
Tian Z, Shen C, Chen H, He T. FCOS: Fully convolutional one-stage object detection. In Proc. the 2019 IEEE/CVF International Conference on Computer Vision, Oct. 27-Nov. 2, 2019, pp.9627–9636. DOI: 10.1109/ICCV.2019.00972.
[23]
Ge Z, Liu S, Wang F, Li Z, Sun J. YOLOX: Exceeding YOLO series in 2021. arXiv: 2107.08430, 2021. https://arxiv.org/abs/2107.08430, Jul. 2024.
[24]
Huang H, Zhou X, Cao J, He R, Tan T. Vision transformer with super token sampling. arXiv: 2211.11167, 2024. https://arxiv.org/abs/2211.11167, Jul. 2024.
[25]
Zhu L, Wang X, Ke Z, Zhang W, Lau R. BiFormer: Vision transformer with bi-level routing attention. In Proc. the 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition, Jun. 2023, pp.10323–10333. DOI: 10.1109/CVPR52729.2023.00995.
[26]
Tian R, Wu Z, Dai Q, Hu H, Qiao Y, Jiang Y G. ResFormer: Scaling ViTs with multi-resolution training. In Proc. the 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition, Jun. 2023, pp.22721–22731. DOI: 10.1109/CVPR52729.2023.02176.
[27]
Kisantal M, Wojna Z, Murawski J, Naruniec J, Cho K. Augmentation for small object detection. arXiv: 1902.07296, 2019. https://arxiv.org/abs/1902.07296, Jul. 2024.
[28]
Liu W, Anguelov D, Erhan D, Szegedy C, Reed S, Fu C Y, Berg A C. SSD: Single shot MultiBox detector. In Proc. the 14th European Conference on Computer Vision, Oct. 2016, pp.21–37. DOI: 10.1007/978-3-319-46448-0_2.
[29]
Cai Z, Fan Q, Feris R S, Vasconcelos N. A unified multi-scale deep convolutional neural network for fast object detection. In Proc. the 14th European Conference on Computer Vision, Oct. 2016, pp.354–370. DOI: 10.1007/978-3-319-46493-0_22.
[30]
Kong T, Yao A, Chen Y, Sun F. HyperNet: Towards accurate region proposal generation and joint object detection. In Proc. the 2016 IEEE Conference on Computer Vision and Pattern Recognition, Jun. 2016, pp.845–853. DOI: 10.1109/CVPR.2016.98.
[31]
Li Y, Chen Y, Wang N, Zhang Z X. Scale-aware trident networks for object detection. In Proc. the 2019 IEEE/CVF International Conference on Computer Vision, Oct. 27-Nov. 2, 2019, pp.6054–6063. DOI: 10.1109/ICCV.2019.00615.
[32]
Singh B, Davis L S. An analysis of scale invariance in object detection—SNIP. In Proc. the 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, Jun. 2018, pp.3578–3587. DOI: 10.1109/CVPR.2018.00377.
[33]
Singh B, Najibi M, Davis L S. SNIPER: Efficient multi-scale training. In Proc. the 32nd International Conference on Neural Information Processing Systems, Dec. 2018, pp.9333–9343.
[34]
Chen Y, Zhang P, Li Z, Li Y, Zhang X, Qi L, Sun J, Jia J. Dynamic scale training for object detection. arXiv: 2004.12432, 2021. https://arxiv.org/abs/2004.12432, Jul. 2024.
[35]
Carion N, Massa F, Synnaeve G, Usunier N, Kirillov A, Zagoruyko S. End-to-end object detection with transformers. In Proc. the 16th European Conference on Computer Vision, Aug. 2020, pp.213–229. DOI: 10.1007/978-3-030-58452-8_13.
[36]
Vaswani A, Shazeer N, Parmar N, Uszkoreit J, Jones L, Gomez A N, Kaiser Ł, Polosukhin I. Attention is all you need. In Proc. the 31st International Conference on Neural Information Processing Systems, Dec. 2017, pp.6000–6010.
[37]
Romero A, Ballas N, Kahou S E, Chassang A, Gatta C, Bengio Y. FitNets: Hints for thin deep nets. In Proc. the 3rd International Conference on Learning Representations, May 2015. DOI: 10.48550/arXiv.1412.6550.
[38]
Loshchilov I, Hutter F. Decoupled weight decay regularization. In Proc. the 7th International Conference on Learning Representations, May 2017.
[39]
Liu H, Liu Q, Liu Y, Liang Y, Zhao G. Exploring effective knowledge distillation for tiny object detection. In Proc. the 2023 IEEE International Conference on Image Processing, Oct. 2023, pp.770–774. DOI: 10.1109/ICIP49359.2023.10222589.
[40]
Ni Z L, Yang F, Wen S, Zhang G. Dual relation knowledge distillation for object detection. In Proc. the 32nd International Joint Conference on Artificial Intelligence, Aug. 2023, pp.1276–1284. DOI: 10.24963/ijcai.2023/142.
[41]
He K, Gkioxari G, Dollár P, Girshick R. Mask R-CNN. In Proc. the 2017 IEEE International Conference on Computer Vision, Oct. 2017, pp.2980–2988. DOI: 10.1109/ICCV.2017.322.
[42]
Lee Y, Hwang J W, Lee S, Bae Y, Park J. An energy and GPU-computation efficient backbone network for real-time object detection. In Proc. the 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops, Jun. 2019, pp.752–760. DOI: 10.1109/CVPRW.2019.00103.
[43]
Sandler M, Howard A, Zhu M, Zhmoginov A, Chen L C. MobileNetV2: Inverted residuals and linear bottlenecks. In Proc. the 2018 IEEE/CVF conference on Computer Vision and Pattern Recognition, Jun. 2018, pp.4510–4520. DOI: 10.1109/CVPR.2018.00474.
Journal of Computer Science and Technology
Pages 798-810
Cite this article:
Ma Y-C, Ma X, Hao T-R, et al. Knowledge Distillation via Hierarchical Matching for Small Object Detection. Journal of Computer Science and Technology, 2024, 39(4): 798-810. https://doi.org/10.1007/s11390-024-4158-5

86

Views

0

Crossref

0

Web of Science

0

Scopus

0

CSCD

Altmetrics

Received: 29 January 2024
Accepted: 20 June 2024
Published: 20 September 2024
© Institute of Computing Technology, Chinese Academy of Sciences 2024
Return