| Sign up

Article Link

Cite

EndNote(RIS) BibTeX

Collect

Collect

Submit Manuscript

Show Outline

Outline

Abstract

Keywords

Electronic Supplementary Material

References

Show full outline

Hide outline

Regular Paper

Sequential Cooperative Distillation for Imbalanced Multi-Task Learning

Quan Feng^{^†}, Jia-Yu Yao^{^†}, Ming-Kun Xie^{^†}, Sheng-Jun Huang, Song-Can Chen()

College of Computer Science and Technology, Nanjing University of Aeronautics and Astronautics, Nanjing 211106, China

MIIT Key Laboratory of Pattern Analysis and Machine Intelligence, Nanjing University of Aeronautics and Astronautics Nanjing 211106, China

^†Co-First Author (Quan Feng wrote the methodological implications of the article, Jia-Yu Yao wrote the related work, Ming-Kun Xie wrote the learning algorithm framework section. The above several have made equal contributions to the paper.) Sheng-Jun Huang revised the introduction section.

Show Author Information

Abstract

Multi-task learning (MTL) can boost the performance of individual tasks by mutual learning among multiple related tasks. However, when these tasks assume diverse complexities, their corresponding losses involved in the MTL objective inevitably compete with each other and ultimately make the learning biased towards simple tasks rather than complex ones. To address this imbalanced learning problem, we propose a novel MTL method that can equip multiple existing deep MTL model architectures with a sequential cooperative distillation (SCD) module. Specifically, we first introduce an efficient mechanism to measure the similarity between tasks, and group similar tasks into the same block to allow their cooperative learning from each other. Based on this, the grouped task blocks are sorted in a queue to determine the learning sequence of the tasks according to their complexities estimated with the defined performance indicator. Finally, a distillation between the individual task-specific models and the MTL model is performed block by block from complex to simple manner, achieving a balance between competition and cooperation among learning multiple tasks. Extensive experiments demonstrate that our method is significantly more competitive compared with state-of-the-art methods, ranking No.1 with average performances across multiple datasets by improving 12.95% and 3.72% compared with OMTL and MTLKD, respectively.

Keywords

multi-task learning (MIT)imbalanced learning similarity estimation knowledge distillation distillation queue

Electronic Supplementary Material

Download File(s)

JCST-2202-12264-Highlights.pdf (942 KB)

References

[1]

Sun G, Probst T, Paudel D P, Popović N, Kanakis M, Patel J, Dai D, Van Gool L. Task switching network for multi-task learning. In Proc. the 2021 IEEE/CVF International Conference on Computer Vision, Oct. 2021, pp.8271–8280. DOI: 10.1109/ICCV48922.2021.00818.

[2]

Brüggemann D, Kanakis M, Obukhov A, Georgoulis S, Van Gool L. Exploring relational context for multi-task dense prediction. In Proc. the 2021 IEEE/CVF International Conference on Computer Vision, Oct. 2021, pp.15849–15858. DOI: 10.1109/ICCV48922.2021.01557.

[3]

Qing L, Li L, Xu S, Huang Y, Liu M, Jin R, Liu B, Niu T, Wen H, Wang Y, Jiang X, Peng Y. Public life in public space (PLPS): A multi-task, multi-group video dataset for public life research. In Proc. the 2021 IEEE/CVF International Conference on Computer Vision Workshops, Oct. 2021, pp.3611–3620. DOI: 10.1109/ICCVW54120.2021.00404.

[4]

Kumar V R, Yogamani S, Rashed H, Sitsu G, Witt C, Leang I, Milz S, Mäder P. OmniDet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robotics and Automation Letters, 2021, 6(2): 2830–2837. DOI: 10.1109/LRA.2021.3062324.

Crossref Google Scholar

[5]

Tseng K K, Lin J, Chen C M, Hassan M M. A fast instance segmentation with one-stage multi-task deep neural network for autonomous driving. Computers & Electrical Engineering, 2021, 93: 107194. DOI: 10.1016/j.compeleceng.2021.107194.

Crossref Google Scholar

[6]

Zhou M, Zhou L, Wang S, Cheng Y, Li L, Yu Z, Liu J. UC²: Universal cross-lingual cross-modal vision-and-language pre-training. In Proc. the 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition, Jun. 2021, pp.4153–4163. DOI: 10.1109/CVPR46437.2021.00414.

[7]

Domingo O, Costa-Jussà M R, Escolano C. A multi-task semi-supervised framework for Text2Graph & Graph2Text. arXiv: 2202.06041, 2022. https://arxiv.org/abs/2202.06041, Sept. 2024.

[8]

Saon G, Tüske Z, Bolanos D, Kingsbury B. Advancing RNN transducer technology for speech recognition. In Proc. the 2021 IEEE International Conference on Acoustics, Speech and Signal Processing, Jun. 2021, pp.5654–5658. DOI: 10.1109/ICASSP39728.2021.9414716.

[9]

Tang Y, Pino J, Wang C, Ma X, Genzel D. A general multi-task learning framework to leverage text data for speech to text tasks. In Proc. the 2021 IEEE International Conference on Acoustics, Speech and Signal Processing, Jun. 2021, pp.6209–6213. DOI: 10.1109/ICASSP39728.2021.9415058.

[10]

Kalashnikov D, Varley J, Chebotar Y, Swanson B, Jonschkowski R, Finn C, Levine S, Hausman K. MT-Opt: Continuous multi-task robotic reinforcement learning at scale. arXiv: 2104.08212, 2021. https://arxiv.org/abs/2104.08212, Sept. 2024.

[11]

Liu C, Li X, Li Q, Xue Y, Liu H, Gao Y. Robot recognizing humans intention and interacting with humans based on a multi-task model combining ST-GCN-LSTM model and YOLO model. Neurocomputing, 2021, 430: 174–184. DOI: 10.1016/j.neucom.2020.10.016.

Crossref Google Scholar

[12]

Lu Y, Kumar A, Zhai S, Cheng Y, Javidi T, Feris R. Fully-adaptive feature sharing in multi-task networks with applications in person attribute classification. In Proc. the 2017 IEEE Conference on Computer Vision and Pattern Recognition, Jul. 2017, pp.1131–1140. DOI: 10.1109/CVPR.2017.126.

[13]

Long M, Cao Z, Wang J, Yu P S. Learning multiple tasks with multilinear relationship networks. In Proc. the 31st International Conference on Neural Information Processing Systems, Dec. 2017, pp.1593–1602.

[14]

Cipolla R, Gal Y, Kendall A. Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In Proc. the 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, Jun. 2018, pp.7482–7491. DOI: 10.1109/CVPR.2018.00781.

[15]

Guo M, Haque A, Huang D A, Yeung S, Fei-Fei L. Dynamic task prioritization for multitask learning. In Proc. the 15th European Conference on Computer Vision, Sept. 2018, pp.282–299. DOI: 10.1007/978-3-030-01270-0_17.

[16]

Li S Y, Huang S J, Chen S. Crowdsourcing aggregation with deep Bayesian learning. Science China Information Sciences, 2021, 64(3): 130104. DOI: 10.1007/s11432-020-3118-7.

Crossref Google Scholar

[17]

Sener O, Koltun V. Multi-task learning as multi-objective optimization. In Proc. the 32nd International Conference on Neural Information Processing Systems, Dec. 2018, pp.525–536.

[18]

Lin X, Zhen H L, Li Z, Zhang Q, Kwong S. Pareto multi-task learning. In Proc. the 33rd International Conference on Neural Information Processing Systems, Dec. 2019, Article No. 1080.

[19]

Ranjan R, Patel V M, Chellappa R. HyperFace: A deep multi-task learning framework for face detection, landmark localization, pose estimation, and gender recognition. IEEE Trans. Pattern Analysis and Machine Intelligence, 2019, 41(1): 121–135. DOI: 10.1109/TPAMI.2017.2781233.

Crossref Google Scholar

[20]

Liu S, Johns E, Davison A J. End-to-end multi-task learning with attention. In Proc. the 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition, Jun. 2019, pp.1871–1880. DOI: 10.1109/CVPR.2019.00197.

[21]

Raychaudhuri D S, Suh Y, Schulter S, Yu X, Faraki M, Roy-Chowdhury A K, Chandraker M. Controllable dynamic multi-task architectures. In Proc. the 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition, Jun. 2022, pp.10945–10954. DOI: 10.1109/CVPR52688.2022.01068.

[22]

Caruana R A. Multitask learning: A knowledge-based source of inductive bias. In Proc. the 10th International Conference on International Conference on Machine Learning, Jun. 1993, pp.41–48. DOI: 10.1016/B978-1-55860-307-3.50012-5.

[23]

Baxter J. A model of inductive bias learning. Journal of Artificial Intelligence Research, 2000, 12: 149–198. DOI: 10.1613/jair.731.

Crossref Google Scholar

[24]

Chen Z, Badrinarayanan V, Lee C Y, Rabinovich A. GradNorm: Gradient normalization for adaptive loss balancing in deep multitask networks. In Proc. the 35th International Conference on Machine Learning, Jul. 2018, pp.793–802.

[25]

Yu T, Kumar S, Gupta A, Levine S, Hausman K, Finn C. Gradient surgery for multi-task learning. In Proc. the 34th International Conference on Neural Information Processing Systems, Dec. 2020, Article No. 489.

[26]

Chung I, Park S, Kim J, Kwak N. Feature-map-level online adversarial knowledge distillation. In Proc. the 37th International Conference on Machine Learning, Jul. 2020, Article No. 187.

[27]

Mirzadeh S I, Farajtabar M, Li A, Levine N, Matsukawa A, Ghasemzadeh H. Improved knowledge distillation via teacher assistant. In Proc. the 34th AAAI Conference on Artificial Intelligence, Apr. 2020, pp.5191–5198. DOI: 10.1609/aaai.v34i04.5963.

[28]

Li W H, Bilen H. Knowledge distillation for multi-task learning. In Proc. the European Conference on Computer Vision, Aug. 2020, pp.163–176. DOI: 10.1007/978-3-030-65414-6_13.

[29]

Masana M, Liu X, Twardowski B, Menta M, Bagdanov A D, Van De Weijer J. Class-incremental learning: Survey and performance evaluation on image classification. IEEE Trans. Pattern Analysis and Machine Intelligence, 2023, 45(5): 5513–5533. DOI: 10.1109/TPAMI.2022.3213473.

Crossref Google Scholar

[30]

De Lange M, Aljundi R, Masana M, Parisot S, Jia X, Leonardis A, Slabaugh G, Tuytelaars T. A continual learning survey: Defying forgetting in classification tasks. IEEE Trans. Pattern Analysis and Machine Intelligence, 2022, 44(7): 3366–3385. DOI: 10.1109/TPAMI.2021.3057446.

Crossref Google Scholar

[31]

Gou J, Yu B, Maybank S J, Tao D. Knowledge distillation: A survey. International Journal of Computer Vision, 2021, 129(6): 1789–1819. DOI: 10.1007/s11263-021-01453-z.

Crossref Google Scholar

[32]

Vandenhende S, Georgoulis S, Van Gansbeke W, Proesmans M, Dai D, Van Gool L. Multi-task learning for dense prediction tasks: A survey. IEEE Trans. Pattern Analysis and Machine Intelligence, 2022, 44(7): 3614–3633. DOI: 10.1109/TPAMI.2021.3054719.

Crossref Google Scholar

[33]

Huang G, Liu Z, Van Der Maaten L, Weinberger K Q. Densely connected convolutional networks. In Proc. the 2017 IEEE Conference on Computer Vision and Pattern Recognition, Jul. 2017, pp.2261–2269. DOI: 10.1109/CVPR.2017.243.

[34]

Shui C, Abbasi M, Robitaille L É, Wang B, Gagné C. A principled approach for learning task similarity in multitask learning. In Proc. the 28th International Joint Conference on Artificial Intelligence, Aug. 2019, pp.3446–3452. DOI: 10.24963/ijcai.2019/478.

[35]

He Y, Liu P, Zhu L, Yang Y. Filter pruning by switching to neighboring CNNs with good attributes. IEEE Trans. Neural Networks and Learning Systems, 2023, 34(10): 8044–8056. DOI: 10.1109/TNNLS.2022.3149332.

Crossref Google Scholar

[36]

Feng Q, Yao J, Zhong Y, Li P, Pan Z. Learning twofold heterogeneous multi-task by sharing similar convolution kernel pairs. Knowledge-Based Systems, 2022, 252: 109396. DOI: 10.1016/j.knosys.2022.109396.

Crossref Google Scholar

[37]

Bellemare M G, Srinivasan S, Ostrovski G, Schaul T, Saxton D, Munos R. Unifying count-based exploration and intrinsic motivation. In Proc. the 30th International Conference on Neural Information Processing Systems, Dec. 2016, pp.1479–1487.

[38]

Graves A, Bellemare M G, Menick J, Munos R, Kavukcuoglu K. Automated curriculum learning for neural networks. In Proc. the 34th International Conference on Machine Learning, Aug. 2017, pp.1311–1320.

[39]

David Eigen, Rob Fergus. Predicting depth, surface normals and semantic labels with a common multi-scale convolutional architecture. In Proc. the IEEE International Conference on Computer Vision, Dec. 2015, pp.2650–2658. DOI: 10.1109/ICCV.2015.304.

[40]

Levine S, Finn C, Darrell T, Abbeel P. End-to-end training of deep visuomotor policies. The Journal of Machine Learning Research, 2016, 17(1): 1334–1373. DOI: 10.5555/2946645.2946684.

Crossref Google Scholar

[41]

Gao Y, Ma J, Zhao M, Liu W, Yuille A L. NDDR-CNN: Layerwise feature fusing in multi-task CNNs by neural discriminative dimensionality reduction. In Proc. the 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition, Jun. 2019, pp.3200–3209. DOI: 10.1109/CVPR.2019.00332.

[42]

Lee S, Son Y. Multitask learning with single gradient step update for task balancing. Neurocomputing, 2022, 467: 442–453. DOI: 10.1016/j.neucom.2021.10.025.

Crossref Google Scholar

[43]

He K, Zhang X, Ren S, Sun J. Deep residual learning for image recognition. In Proc. the 2016 IEEE Conference on Computer Vision and Pattern Recognition, Jun. 2016, pp.770–778. DOI: 10.1109/CVPR.2016.90.

[44]

Eigen D, Fergus R. Predicting depth, surface normals and semantic labels with a common multi-scale convolutional architecture. In Proc. the 2015 IEEE International Conference on Computer Vision, Dec. 2015, pp.2650–2658. DOI: 10.1109/ICCV.2015.304.

Journal of Computer Science and Technology

Volume 39 Issue 5,
September 2024

Pages 1094-1106

DOI: 10.1007/s11390-024-2264-z

Cite this article:

Feng Q, Yao J-Y, Xie M-K, et al. Sequential Cooperative Distillation for Imbalanced Multi-Task Learning. Journal of Computer Science and Technology, 2024, 39(5): 1094-1106. https://doi.org/10.1007/s11390-024-2264-z

About Us

Learn about Open Access

Tsinghua University Press

Publish with Us

Peer Review Policy

Copyright and Licensing

Article Processing Charge

Contact Us

Journal Collaboration: Yao Meng (Ms.)✉️ +86-10-83470574

Technical Support: Kuo Zhao (Mr.)✉️ +86-10-83470507

Media Contact: Hao Jin (Mr.)✉️ +86-10-83470559

Address: Floor 6, Tower B, Xueyan Building, Shuangqing Road, Haidian District, Beijing 100084, China.

SciOpen——中国科技期刊卓越行动计划支持项目

Copyright © 2025 Tsinghua University Press Ltd.

京ICP备 10035462号-42 京公网安备11010802044758号