| Sign up

Article Link

Cite

EndNote(RIS) BibTeX

Collect

Collect

Submit Manuscript

Show Outline

Outline

Abstract

Keywords

Electronic Supplementary Material

References

Show full outline

Hide outline

Regular Paper

Knowledge Distillation via Hierarchical Matching for Small Object Detection

Yong-Chi Ma, Xiao Ma, Tian-Ran Hao, Li-Sha Cui, Shao-Hui Jin, Pei Lyu()

School of Computer Science and Artificial Intelligence, Zhengzhou University, Zhengzhou 450000, China

Show Author Information

Abstract

Knowledge distillation is often used for model compression and has achieved a great breakthrough in image classification, but there still remains scope for improvement in object detection, especially for knowledge extraction of small objects. The main problem is the features of small objects are often polluted by background noise and not prominent due to down-sampling of convolutional neural network (CNN), resulting in the insufficient refinement of small object features during distillation. In this paper, we propose Hierarchical Matching Knowledge Distillation Network (HMKD) that operates on the pyramid level P2 to pyramid level P4 of the feature pyramid network (FPN), aiming to intervene on small object features before affecting. We employ an encoder-decoder network to encapsulate low-resolution, highly semantic information, akin to eliciting insights from profound strata within a teacher network, and then match the encapsulated information with high-resolution feature values of small objects from shallow layers as the key. During this period, we use an attention mechanism to measure the relevance of the inquiry to the feature values. Also in the process of decoding, knowledge is distilled to the student. In addition, we introduce a supplementary distillation module to mitigate the effects of background noise. Experiments show that our method achieves excellent improvements for both one-stage and two-stage object detectors. Specifically, applying the proposed method on Faster R-CNN achieves 41.7% mAP on COCO2017 (ResNet50 as the backbone), which is 3.8% higher than that of the baseline.

Keywords

knowledge distillation object detection small object detection machine learning

Electronic Supplementary Material

Download File(s)

JCST-2401-14158-Highlights.pdf (149.3 KB)

References

[1]

Cao C, Wang B, Zhang W, Zeng X, Yan X, Feng Z, Liu Y, Wu Z. An improved faster R-CNN for small object detection. IEEE Access, 2019, 7: 106838–106846. DOI: 10.1109/ACCESS.2019.2932731.

Crossref Google Scholar

[2]

Yang C, Huang Z, Wang N. QueryDet: Cascaded sparse query for accelerating high-resolution small object detection. In Proc. the 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition, Jun. 2022, pp.13658–13667. DOI: 10.1109/CVPR52688.2022.01330.

[3]

Hubara I, Courbariaux M, Soudry D, El-Yaniv R, Bengio Y. Binarized neural networks. In Proc. the 30th International Conference on Neural Information Processing Systems, Dec. 2016, pp.4114–4122.

[4]

Rastegari M, Ordonez V, Redmon J, Farhadi A. XNOR-Net: ImageNet classification using binary convolutional neural networks. In Proc. the 14th European Conference on Computer Vision, Oct. 2016, pp.525–542. DOI: 10.1007/978-3-319-46493-0_32.

[5]

Han S, Pool J, Tran J, Dally W J. Learning both weights and connections for efficient neural network. In Proc. the 28th International Conference on Neural Information Processing Systems, Dec. 2015, pp.1135–1143.

[6]

He Y, Zhang X, Sun J. Channel pruning for accelerating very deep neural networks. In Proc. the 2017 IEEE International Conference on Computer Vision, Oct. 2017, pp.1398–1406. DOI: 10.1109/ICCV.2017.155.

[7]

Hinton G, Vinyals O, Dean J. Distilling the knowledge in a neural network. arXiv: 1503.02531, 2015. https://arxiv.org/abs/1503.02531, Jul. 2024.

[8]

Ji M, Heo B, Park S. Show, attend and distill: Knowledge distillation via attention-based feature matching. In Proc. the 35th AAAI Conference on Artificial Intelligence, Feb. 2021, pp.7945–7952. DOI: 10.1609/aaai.v35i9.16969.

[9]

Wang T, Yuan L, Zhang X, Feng J. Distilling object detectors with fine-grained feature imitation. In Proc. the 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition, Jun. 2019, pp.4928–4937. DOI: 10.1109/CVPR.2019.00507.

[10]

Zhang L, Ma K. Improve object detection with feature-based knowledge distillation: Towards accurate and efficient detectors. In Proc. the 9th International Conference on Learning Representations, May 2021.

[11]

Heo B, Kim J, Yun S, Park H, Kwak N, Choi J Y. A comprehensive overhaul of feature distillation. In Proc. the 2019 IEEE/CVF International Conference on Computer Vision, Oct. 27-Nov. 2, 2019, pp.1921–1930. DOI: 10.1109/ICCV.2019.00201.

[12]

Ren S, He K, Girshick R, Sun J. Faster R-CNN: Towards real-time object detection with region proposal networks. In Proc. the 28th International Conference on Neural Information Processing Systems, Dec. 2015, pp.91–99.

[13]

Lin T Y, Dollár P, Girshick R, He K, Hariharan B, Belongie S. Feature pyramid networks for object detection. In Proc. the 2017 IEEE conference on Computer Vision and Pattern Recognition, Jul. 2017, pp.936–944. DOI: 10.1109/CVPR.2017.106.

[14]

Kang Z, Zhang P, Zhang X, Sun J, Zheng N. Instance-conditional knowledge distillation for object detection. In Proc. the 35th International Conference on Neural Information Processing Systems, Dec. 2021, Article No. 1259.

[15]

Selvaraju R R, Cogswell M, Das A, Vedantam R, Parikh D, Batra D. Grad-CAM: Visual explanations from deep networks via gradient-based localization. In Proc. the 2017 IEEE International Conference on Computer Vision, Oct. 2017, pp.618–626. DOI: 10.1109/ICCV.2017.74.

[16]

Cao Y, Xu J, Lin S, Wei F, Hu H. GCNet: Non-local networks meet squeeze-excitation networks and beyond. In Proc. the 2019 IEEE/CVF International Conference on Computer Vision workshop, Oct. 2019, pp.1971–1980. DOI: 10.1109/ICCVW.2019.00246.

[17]

Yang Z, Li Z, Jiang X, Gong Y, Yuan Z, Zhao D, Yuan C. Focal and global knowledge distillation for detectors. In Proc. the 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition, Jun. 2022, pp.4633–4642. DOI: 10.1109/CVPR52688.2022.00460.

[18]

Chen G, Choi W, Yu X, Han T, Chandraker M. Learning efficient object detection models with knowledge distillation. In Proc. the 31st International Conference on Neural Information Processing Systems, Dec. 2017, pp.742–751.

[19]

Guo J, Han K, Wang Y, Wu H, Chen X, Xu C, Xu C. Distilling object detectors via decoupled features. In Proc. the 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition, Jun. 2021, pp.2154–2164. DOI: 10.1109/CVPR46437.2021.00219.

[20]

Chang J, Wang S, Xu H M, Chen Z, Yang C, Zhao F. DETRDistill: A universal knowledge distillation framework for DETR-families. In Proc. the 2023 IEEE/CVF International Conference on Computer Vision, Oct. 2023, pp.6875–6885. DOI: 10.1109/ICCV51070.2023.00635.

[21]

Lin T Y, Goyal P, Girshick R, He K, Dollár P. Focal loss for dense object detection. IEEE Trans. Pattern Analysis and Machine Intelligence, 2020, 42(2): 318–327. DOI: 10.1109/TPAMI.2018.2858826.

Crossref Google Scholar

[22]

Tian Z, Shen C, Chen H, He T. FCOS: Fully convolutional one-stage object detection. In Proc. the 2019 IEEE/CVF International Conference on Computer Vision, Oct. 27-Nov. 2, 2019, pp.9627–9636. DOI: 10.1109/ICCV.2019.00972.

[23]

Ge Z, Liu S, Wang F, Li Z, Sun J. YOLOX: Exceeding YOLO series in 2021. arXiv: 2107.08430, 2021. https://arxiv.org/abs/2107.08430, Jul. 2024.

[24]

Huang H, Zhou X, Cao J, He R, Tan T. Vision transformer with super token sampling. arXiv: 2211.11167, 2024. https://arxiv.org/abs/2211.11167, Jul. 2024.

[25]

Zhu L, Wang X, Ke Z, Zhang W, Lau R. BiFormer: Vision transformer with bi-level routing attention. In Proc. the 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition, Jun. 2023, pp.10323–10333. DOI: 10.1109/CVPR52729.2023.00995.

[26]

Tian R, Wu Z, Dai Q, Hu H, Qiao Y, Jiang Y G. ResFormer: Scaling ViTs with multi-resolution training. In Proc. the 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition, Jun. 2023, pp.22721–22731. DOI: 10.1109/CVPR52729.2023.02176.

[27]

Kisantal M, Wojna Z, Murawski J, Naruniec J, Cho K. Augmentation for small object detection. arXiv: 1902.07296, 2019. https://arxiv.org/abs/1902.07296, Jul. 2024.

[28]

Liu W, Anguelov D, Erhan D, Szegedy C, Reed S, Fu C Y, Berg A C. SSD: Single shot MultiBox detector. In Proc. the 14th European Conference on Computer Vision, Oct. 2016, pp.21–37. DOI: 10.1007/978-3-319-46448-0_2.

[29]

Cai Z, Fan Q, Feris R S, Vasconcelos N. A unified multi-scale deep convolutional neural network for fast object detection. In Proc. the 14th European Conference on Computer Vision, Oct. 2016, pp.354–370. DOI: 10.1007/978-3-319-46493-0_22.

[30]

Kong T, Yao A, Chen Y, Sun F. HyperNet: Towards accurate region proposal generation and joint object detection. In Proc. the 2016 IEEE Conference on Computer Vision and Pattern Recognition, Jun. 2016, pp.845–853. DOI: 10.1109/CVPR.2016.98.

[31]

Li Y, Chen Y, Wang N, Zhang Z X. Scale-aware trident networks for object detection. In Proc. the 2019 IEEE/CVF International Conference on Computer Vision, Oct. 27-Nov. 2, 2019, pp.6054–6063. DOI: 10.1109/ICCV.2019.00615.

[32]

Singh B, Davis L S. An analysis of scale invariance in object detection—SNIP. In Proc. the 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, Jun. 2018, pp.3578–3587. DOI: 10.1109/CVPR.2018.00377.

[33]

Singh B, Najibi M, Davis L S. SNIPER: Efficient multi-scale training. In Proc. the 32nd International Conference on Neural Information Processing Systems, Dec. 2018, pp.9333–9343.

[34]

Chen Y, Zhang P, Li Z, Li Y, Zhang X, Qi L, Sun J, Jia J. Dynamic scale training for object detection. arXiv: 2004.12432, 2021. https://arxiv.org/abs/2004.12432, Jul. 2024.

[35]

Carion N, Massa F, Synnaeve G, Usunier N, Kirillov A, Zagoruyko S. End-to-end object detection with transformers. In Proc. the 16th European Conference on Computer Vision, Aug. 2020, pp.213–229. DOI: 10.1007/978-3-030-58452-8_13.

[36]

Vaswani A, Shazeer N, Parmar N, Uszkoreit J, Jones L, Gomez A N, Kaiser Ł, Polosukhin I. Attention is all you need. In Proc. the 31st International Conference on Neural Information Processing Systems, Dec. 2017, pp.6000–6010.

[37]

Romero A, Ballas N, Kahou S E, Chassang A, Gatta C, Bengio Y. FitNets: Hints for thin deep nets. In Proc. the 3rd International Conference on Learning Representations, May 2015. DOI: 10.48550/arXiv.1412.6550.

[38]

Loshchilov I, Hutter F. Decoupled weight decay regularization. In Proc. the 7th International Conference on Learning Representations, May 2017.

[39]

Liu H, Liu Q, Liu Y, Liang Y, Zhao G. Exploring effective knowledge distillation for tiny object detection. In Proc. the 2023 IEEE International Conference on Image Processing, Oct. 2023, pp.770–774. DOI: 10.1109/ICIP49359.2023.10222589.

[40]

Ni Z L, Yang F, Wen S, Zhang G. Dual relation knowledge distillation for object detection. In Proc. the 32nd International Joint Conference on Artificial Intelligence, Aug. 2023, pp.1276–1284. DOI: 10.24963/ijcai.2023/142.

[41]

He K, Gkioxari G, Dollár P, Girshick R. Mask R-CNN. In Proc. the 2017 IEEE International Conference on Computer Vision, Oct. 2017, pp.2980–2988. DOI: 10.1109/ICCV.2017.322.

[42]

Lee Y, Hwang J W, Lee S, Bae Y, Park J. An energy and GPU-computation efficient backbone network for real-time object detection. In Proc. the 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops, Jun. 2019, pp.752–760. DOI: 10.1109/CVPRW.2019.00103.

[43]

Sandler M, Howard A, Zhu M, Zhmoginov A, Chen L C. MobileNetV2: Inverted residuals and linear bottlenecks. In Proc. the 2018 IEEE/CVF conference on Computer Vision and Pattern Recognition, Jun. 2018, pp.4510–4520. DOI: 10.1109/CVPR.2018.00474.

Journal of Computer Science and Technology

Volume 39 Issue 4,
July 2024

Pages 798-810

DOI: 10.1007/s11390-024-4158-5

Cite this article:

Ma Y-C, Ma X, Hao T-R, et al. Knowledge Distillation via Hierarchical Matching for Small Object Detection. Journal of Computer Science and Technology, 2024, 39(4): 798-810. https://doi.org/10.1007/s11390-024-4158-5

About Us

Learn about Open Access

Tsinghua University Press

Publish with Us

Peer Review Policy

Copyright and Licensing

Article Processing Charge

Contact Us

Journal Collaboration: Yao Meng (Ms.)✉️ +86-10-83470574

Technical Support: Kuo Zhao (Mr.)✉️ +86-10-83470507

Media Contact: Hao Jin (Mr.)✉️ +86-10-83470559

Address: Floor 6, Tower B, Xueyan Building, Shuangqing Road, Haidian District, Beijing 100084, China.

SciOpen——中国科技期刊卓越行动计划支持项目

Copyright © 2025 Tsinghua University Press Ltd.

京ICP备 10035462号-42 京公网安备11010802044758号