Article Link
Collect
Submit Manuscript
Show Outline
Outline
Abstract
Keywords
Electronic Supplementary Material
References
Show full outline
Hide outline
Regular Paper

Sequential Cooperative Distillation for Imbalanced Multi-Task Learning

College of Computer Science and Technology, Nanjing University of Aeronautics and Astronautics, Nanjing 211106, China
MIIT Key Laboratory of Pattern Analysis and Machine Intelligence, Nanjing University of Aeronautics and Astronautics Nanjing 211106, China

Co-First Author (Quan Feng wrote the methodological implications of the article, Jia-Yu Yao wrote the related work, Ming-Kun Xie wrote the learning algorithm framework section. The above several have made equal contributions to the paper.) Sheng-Jun Huang revised the introduction section.

Show Author Information

Abstract

Multi-task learning (MTL) can boost the performance of individual tasks by mutual learning among multiple related tasks. However, when these tasks assume diverse complexities, their corresponding losses involved in the MTL objective inevitably compete with each other and ultimately make the learning biased towards simple tasks rather than complex ones. To address this imbalanced learning problem, we propose a novel MTL method that can equip multiple existing deep MTL model architectures with a sequential cooperative distillation (SCD) module. Specifically, we first introduce an efficient mechanism to measure the similarity between tasks, and group similar tasks into the same block to allow their cooperative learning from each other. Based on this, the grouped task blocks are sorted in a queue to determine the learning sequence of the tasks according to their complexities estimated with the defined performance indicator. Finally, a distillation between the individual task-specific models and the MTL model is performed block by block from complex to simple manner, achieving a balance between competition and cooperation among learning multiple tasks. Extensive experiments demonstrate that our method is significantly more competitive compared with state-of-the-art methods, ranking No.1 with average performances across multiple datasets by improving 12.95% and 3.72% compared with OMTL and MTLKD, respectively.

Electronic Supplementary Material

Download File(s)
JCST-2202-12264-Highlights.pdf (942 KB)

References

[1]
Sun G, Probst T, Paudel D P, Popović N, Kanakis M, Patel J, Dai D, Van Gool L. Task switching network for multi-task learning. In Proc. the 2021 IEEE/CVF International Conference on Computer Vision, Oct. 2021, pp.8271–8280. DOI: 10.1109/ICCV48922.2021.00818.
[2]
Brüggemann D, Kanakis M, Obukhov A, Georgoulis S, Van Gool L. Exploring relational context for multi-task dense prediction. In Proc. the 2021 IEEE/CVF International Conference on Computer Vision, Oct. 2021, pp.15849–15858. DOI: 10.1109/ICCV48922.2021.01557.
[3]
Qing L, Li L, Xu S, Huang Y, Liu M, Jin R, Liu B, Niu T, Wen H, Wang Y, Jiang X, Peng Y. Public life in public space (PLPS): A multi-task, multi-group video dataset for public life research. In Proc. the 2021 IEEE/CVF International Conference on Computer Vision Workshops, Oct. 2021, pp.3611–3620. DOI: 10.1109/ICCVW54120.2021.00404.
[4]

Kumar V R, Yogamani S, Rashed H, Sitsu G, Witt C, Leang I, Milz S, Mäder P. OmniDet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robotics and Automation Letters, 2021, 6(2): 2830–2837. DOI: 10.1109/LRA.2021.3062324.

[5]

Tseng K K, Lin J, Chen C M, Hassan M M. A fast instance segmentation with one-stage multi-task deep neural network for autonomous driving. Computers & Electrical Engineering, 2021, 93: 107194. DOI: 10.1016/j.compeleceng.2021.107194.

[6]
Zhou M, Zhou L, Wang S, Cheng Y, Li L, Yu Z, Liu J. UC2: Universal cross-lingual cross-modal vision-and-language pre-training. In Proc. the 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition, Jun. 2021, pp.4153–4163. DOI: 10.1109/CVPR46437.2021.00414.
[7]
Domingo O, Costa-Jussà M R, Escolano C. A multi-task semi-supervised framework for Text2Graph & Graph2Text. arXiv: 2202.06041, 2022. https://arxiv.org/abs/2202.06041, Sept. 2024.
[8]
Saon G, Tüske Z, Bolanos D, Kingsbury B. Advancing RNN transducer technology for speech recognition. In Proc. the 2021 IEEE International Conference on Acoustics, Speech and Signal Processing, Jun. 2021, pp.5654–5658. DOI: 10.1109/ICASSP39728.2021.9414716.
[9]
Tang Y, Pino J, Wang C, Ma X, Genzel D. A general multi-task learning framework to leverage text data for speech to text tasks. In Proc. the 2021 IEEE International Conference on Acoustics, Speech and Signal Processing, Jun. 2021, pp.6209–6213. DOI: 10.1109/ICASSP39728.2021.9415058.
[10]
Kalashnikov D, Varley J, Chebotar Y, Swanson B, Jonschkowski R, Finn C, Levine S, Hausman K. MT-Opt: Continuous multi-task robotic reinforcement learning at scale. arXiv: 2104.08212, 2021. https://arxiv.org/abs/2104.08212, Sept. 2024.
[11]

Liu C, Li X, Li Q, Xue Y, Liu H, Gao Y. Robot recognizing humans intention and interacting with humans based on a multi-task model combining ST-GCN-LSTM model and YOLO model. Neurocomputing, 2021, 430: 174–184. DOI: 10.1016/j.neucom.2020.10.016.

[12]
Lu Y, Kumar A, Zhai S, Cheng Y, Javidi T, Feris R. Fully-adaptive feature sharing in multi-task networks with applications in person attribute classification. In Proc. the 2017 IEEE Conference on Computer Vision and Pattern Recognition, Jul. 2017, pp.1131–1140. DOI: 10.1109/CVPR.2017.126.
[13]
Long M, Cao Z, Wang J, Yu P S. Learning multiple tasks with multilinear relationship networks. In Proc. the 31st International Conference on Neural Information Processing Systems, Dec. 2017, pp.1593–1602.
[14]
Cipolla R, Gal Y, Kendall A. Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In Proc. the 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, Jun. 2018, pp.7482–7491. DOI: 10.1109/CVPR.2018.00781.
[15]
Guo M, Haque A, Huang D A, Yeung S, Fei-Fei L. Dynamic task prioritization for multitask learning. In Proc. the 15th European Conference on Computer Vision, Sept. 2018, pp.282–299. DOI: 10.1007/978-3-030-01270-0_17.
[16]

Li S Y, Huang S J, Chen S. Crowdsourcing aggregation with deep Bayesian learning. Science China Information Sciences, 2021, 64(3): 130104. DOI: 10.1007/s11432-020-3118-7.

[17]
Sener O, Koltun V. Multi-task learning as multi-objective optimization. In Proc. the 32nd International Conference on Neural Information Processing Systems, Dec. 2018, pp.525–536.
[18]
Lin X, Zhen H L, Li Z, Zhang Q, Kwong S. Pareto multi-task learning. In Proc. the 33rd International Conference on Neural Information Processing Systems, Dec. 2019, Article No. 1080.
[19]

Ranjan R, Patel V M, Chellappa R. HyperFace: A deep multi-task learning framework for face detection, landmark localization, pose estimation, and gender recognition. IEEE Trans. Pattern Analysis and Machine Intelligence, 2019, 41(1): 121–135. DOI: 10.1109/TPAMI.2017.2781233.

[20]
Liu S, Johns E, Davison A J. End-to-end multi-task learning with attention. In Proc. the 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition, Jun. 2019, pp.1871–1880. DOI: 10.1109/CVPR.2019.00197.
[21]
Raychaudhuri D S, Suh Y, Schulter S, Yu X, Faraki M, Roy-Chowdhury A K, Chandraker M. Controllable dynamic multi-task architectures. In Proc. the 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition, Jun. 2022, pp.10945–10954. DOI: 10.1109/CVPR52688.2022.01068.
[22]
Caruana R A. Multitask learning: A knowledge-based source of inductive bias. In Proc. the 10th International Conference on International Conference on Machine Learning, Jun. 1993, pp.41–48. DOI: 10.1016/B978-1-55860-307-3.50012-5.
[23]

Baxter J. A model of inductive bias learning. Journal of Artificial Intelligence Research, 2000, 12: 149–198. DOI: 10.1613/jair.731.

[24]
Chen Z, Badrinarayanan V, Lee C Y, Rabinovich A. GradNorm: Gradient normalization for adaptive loss balancing in deep multitask networks. In Proc. the 35th International Conference on Machine Learning, Jul. 2018, pp.793–802.
[25]
Yu T, Kumar S, Gupta A, Levine S, Hausman K, Finn C. Gradient surgery for multi-task learning. In Proc. the 34th International Conference on Neural Information Processing Systems, Dec. 2020, Article No. 489.
[26]
Chung I, Park S, Kim J, Kwak N. Feature-map-level online adversarial knowledge distillation. In Proc. the 37th International Conference on Machine Learning, Jul. 2020, Article No. 187.
[27]
Mirzadeh S I, Farajtabar M, Li A, Levine N, Matsukawa A, Ghasemzadeh H. Improved knowledge distillation via teacher assistant. In Proc. the 34th AAAI Conference on Artificial Intelligence, Apr. 2020, pp.5191–5198. DOI: 10.1609/aaai.v34i04.5963.
[28]
Li W H, Bilen H. Knowledge distillation for multi-task learning. In Proc. the European Conference on Computer Vision, Aug. 2020, pp.163–176. DOI: 10.1007/978-3-030-65414-6_13.
[29]

Masana M, Liu X, Twardowski B, Menta M, Bagdanov A D, Van De Weijer J. Class-incremental learning: Survey and performance evaluation on image classification. IEEE Trans. Pattern Analysis and Machine Intelligence, 2023, 45(5): 5513–5533. DOI: 10.1109/TPAMI.2022.3213473.

[30]

De Lange M, Aljundi R, Masana M, Parisot S, Jia X, Leonardis A, Slabaugh G, Tuytelaars T. A continual learning survey: Defying forgetting in classification tasks. IEEE Trans. Pattern Analysis and Machine Intelligence, 2022, 44(7): 3366–3385. DOI: 10.1109/TPAMI.2021.3057446.

[31]

Gou J, Yu B, Maybank S J, Tao D. Knowledge distillation: A survey. International Journal of Computer Vision, 2021, 129(6): 1789–1819. DOI: 10.1007/s11263-021-01453-z.

[32]

Vandenhende S, Georgoulis S, Van Gansbeke W, Proesmans M, Dai D, Van Gool L. Multi-task learning for dense prediction tasks: A survey. IEEE Trans. Pattern Analysis and Machine Intelligence, 2022, 44(7): 3614–3633. DOI: 10.1109/TPAMI.2021.3054719.

[33]
Huang G, Liu Z, Van Der Maaten L, Weinberger K Q. Densely connected convolutional networks. In Proc. the 2017 IEEE Conference on Computer Vision and Pattern Recognition, Jul. 2017, pp.2261–2269. DOI: 10.1109/CVPR.2017.243.
[34]
Shui C, Abbasi M, Robitaille L É, Wang B, Gagné C. A principled approach for learning task similarity in multitask learning. In Proc. the 28th International Joint Conference on Artificial Intelligence, Aug. 2019, pp.3446–3452. DOI: 10.24963/ijcai.2019/478.
[35]

He Y, Liu P, Zhu L, Yang Y. Filter pruning by switching to neighboring CNNs with good attributes. IEEE Trans. Neural Networks and Learning Systems, 2023, 34(10): 8044–8056. DOI: 10.1109/TNNLS.2022.3149332.

[36]

Feng Q, Yao J, Zhong Y, Li P, Pan Z. Learning twofold heterogeneous multi-task by sharing similar convolution kernel pairs. Knowledge-Based Systems, 2022, 252: 109396. DOI: 10.1016/j.knosys.2022.109396.

[37]
Bellemare M G, Srinivasan S, Ostrovski G, Schaul T, Saxton D, Munos R. Unifying count-based exploration and intrinsic motivation. In Proc. the 30th International Conference on Neural Information Processing Systems, Dec. 2016, pp.1479–1487.
[38]
Graves A, Bellemare M G, Menick J, Munos R, Kavukcuoglu K. Automated curriculum learning for neural networks. In Proc. the 34th International Conference on Machine Learning, Aug. 2017, pp.1311–1320.
[39]
David Eigen, Rob Fergus. Predicting depth, surface normals and semantic labels with a common multi-scale convolutional architecture. In Proc. the IEEE International Conference on Computer Vision, Dec. 2015, pp.2650–2658. DOI: 10.1109/ICCV.2015.304.
[40]

Levine S, Finn C, Darrell T, Abbeel P. End-to-end training of deep visuomotor policies. The Journal of Machine Learning Research, 2016, 17(1): 1334–1373. DOI: 10.5555/2946645.2946684.

[41]
Gao Y, Ma J, Zhao M, Liu W, Yuille A L. NDDR-CNN: Layerwise feature fusing in multi-task CNNs by neural discriminative dimensionality reduction. In Proc. the 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition, Jun. 2019, pp.3200–3209. DOI: 10.1109/CVPR.2019.00332.
[42]

Lee S, Son Y. Multitask learning with single gradient step update for task balancing. Neurocomputing, 2022, 467: 442–453. DOI: 10.1016/j.neucom.2021.10.025.

[43]
He K, Zhang X, Ren S, Sun J. Deep residual learning for image recognition. In Proc. the 2016 IEEE Conference on Computer Vision and Pattern Recognition, Jun. 2016, pp.770–778. DOI: 10.1109/CVPR.2016.90.
[44]
Eigen D, Fergus R. Predicting depth, surface normals and semantic labels with a common multi-scale convolutional architecture. In Proc. the 2015 IEEE International Conference on Computer Vision, Dec. 2015, pp.2650–2658. DOI: 10.1109/ICCV.2015.304.
Journal of Computer Science and Technology
Pages 1094-1106
Cite this article:
Feng Q, Yao J-Y, Xie M-K, et al. Sequential Cooperative Distillation for Imbalanced Multi-Task Learning. Journal of Computer Science and Technology, 2024, 39(5): 1094-1106. https://doi.org/10.1007/s11390-024-2264-z
Metrics & Citations  
Article History
Copyright
Return