| Sign up

PDF (3.1 MB)

Cite

EndNote(RIS) BibTeX

Collect

Collect

Submit Manuscript

Open Access

Parameter Disentanglement for Diverse Representations

Jingxu Wang^¹, Jingda Guo^², Ruili Wang^³, Zhao Zhang^⁴, Liyong Fu^⁵, Qiaolin Ye^¹()

1the College of Information Science and Technology & Artificial Intelligence, State Key Laboratory of Tree Genetics and Breeding, and also with the Co-Innovation Center for Sustainable Forestry in Southern China, Nanjing Forestry University, Nanjing 210037, China

2the Research Institute of New Technology, Hillstone Networks, Santa Clara, CA 95054, USA

3the School of Mathematical and Computational Sciences, Massey University, Auckland 102-904, New Zealand

4the Key Laboratory of Knowledge Engineering with Big Data, Ministry of Education, Hefei University of Technology, Hefei 230009, China

5the College of Forestry, Hebei Agricultural University, Baoding 071000, China, and also with the Institute of Forest Resource Information Techniques, Chinese Academy of Forestry, Beijing 100091, China

Show Author Information

Abstract

Recent advances in neural network architectures reveal the importance of diverse representations. However, simply integrating more branches or increasing the width for the diversity would inevitably increase model complexity, leading to prohibitive inference costs. In this paper, we revisit the learnable parameters in neural networks and showcase that it is feasible to disentangle learnable parameters to latent sub-parameters, which focus on different patterns and representations. This important finding leads us to study further the aggregation of diverse representations in a network structure. To this end, we propose Parameter Disentanglement for Diverse Representations (PDDR), which considers diverse patterns in parallel during training, and aggregates them into one for efficient inference. To further enhance the diverse representations, we develop a lightweight refinement module in PDDR, which adaptively refines the combination of diverse representations according to the input. PDDR can be seamlessly integrated into modern networks, significantly improving the learning capacity of a network while maintaining the same complexity for inference. Experimental results show great improvements on various tasks, with an improvement of 1.47% over Residual Network 50 (ResNet50) on ImageNet, and we improve the detection results of Retina Residual Network 50 (Retina-ResNet50) by 1.7% Mean Average Precision (mAP). Integrating PDDR into recent lightweight vision transformer models, the resulting model outperforms related works by a clear margin.

Keywords

representation learning efficient network computer vision

References

[1]

K. He, X. Zhang, S. Ren, and J. Sun, Deep residual learning for image recognition, in Proc. IEEE Conf. Computer Vision and Pattern Recognition (CVPR), Las Vegas, NV, USA, 2016, pp. 770–778.

[2]

S. Y. Wang, Z. Qu, and C. J. Li, A dense-aware cross-splitNet for object detection and recognition, IEEE Trans. Circuits Syst. Video Technol., vol. 33, no. 5, pp. 2290–2301, 2023.

Crossref Google Scholar

[3]

A. Krizhevsky, I. Sutskever, and G. E. Hinton, ImageNet classification with deep convolutional neural networks, Commun. ACM, vol. 60, no. 6, pp. 84–90, 2017.

[4]

K. Simonyan and A. Zisserman, Very deep convolutional networks for large-scale image recognition, arXiv preprint arXiv: 1409.1556, 2014.

[5]

G. Huang, Z. Liu, L. Van Der Maaten, and K. Q. Weinberger, Densely connected convolutional networks, in Proc. IEEE Conf. Computer Vision and Pattern Recognition (CVPR), Honolulu, HI, USA, 2017, pp. 2261–2269.

[6]

C. Szegedy, W. Liu, Y. Jia, P. Sermanet, S. Reed, D. Anguelov, D. Erhan, V. Vanhoucke, and A. Rabinovich, Going deeper with convolutions, in Proc. IEEE Conf. Computer Vision and Pattern Recognition (CVPR), Boston, MA, USA, 2015, pp. 1–9.

[7]

C. Szegedy, V. Vanhoucke, S. Ioffe, J. Shlens, and Z. Wojna, Rethinking the inception architecture for computer vision, in Proc. IEEE Conf. Computer Vision and Pattern Recognition (CVPR), Las Vegas, NV, USA, 2016, pp. 2818–2826.

[8]

S. Xie, R. Girshick, P. Dollár, Z. Tu, and K. He, Aggregated residual transformations for deep neural networks, in Proc. IEEE Conf. Computer Vision and Pattern Recognition (CVPR), Honolulu, HI, USA, 2017, pp. 5987–5995.

[9]

X. Lin, C. Zhao, and W. Pan, Towards accurate binary convolutional neural network, in Proc. 31st Int. Conf. Neural Information Processing Systems, Long Beach, CA, USA, 2017, pp. 344–352.

[10]

Y. Xie, X. Hou, Y. Guo, X. Wang, and J. Zheng, Joint-guided distillation binary neural network via dynamic channel-wise diversity enhancement for object detection, IEEE Trans. Circuits Syst. Video Technol., vol. 34, no. 1, pp. 448–460, 2024.

Crossref Google Scholar

[11]

M. Sandler, A. Howard, M. Zhu, A. Zhmoginov, and L. C. Chen, MobileNetV2: Inverted residuals and linear bottlenecks, in Proc. IEEE/CVF Conf. Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 2018, pp. 4510–4520.

[12]

M. Tan, B. Chen, R. Pang, V. Vasudevan, M. Sandler, A. Howard, and Q. V. Le, MnasNet: Platform-aware neural architecture search for mobile, in Proc. IEEE/CVF Conf. Computer Vision and Pattern Recognition (CVPR), Long Beach, CA, USA, 2019, pp. 2815–2823.

[13]

W. Yu, M. Luo, P. Zhou, C. Si, Y. Zhou, X. Wang, J. Feng, and S. Yan, MetaFormer is actually what you need for vision, in Proc. IEEE/CVF Conf. Computer Vision and Pattern Recognition (CVPR), New Orleans, LA, USA, 2022, pp. 10809–10819.

[14]

L. Zhang, M. Wang, M. Liu, and D. Zhang, A survey on deep learning for neuroimaging-based brain disorder analysis, Front. Neurosci., vol. 14, pp. 779, 2020.

Crossref Google Scholar

[15]

G. Zhu, J. Cao, L. Chen, Y. Wang, Z. Bu, S. Yang, J. Wu, and Z. Wang, A multi-task graph neural network with variational graph auto-encoders for session-based travel packages recommendation, ACM Trans. Web, vol. 17, no. 3, pp. 1–30, 2023.

Crossref Google Scholar

[16]

T. Y. Lin, M. Maire, S. Belongie, J. Hays, P. Perona, D. Ramanan, P. Dollár, and C. L. Zitnick, Microsoft COCO: Common objects in context, in Computer Vision—ECCV 2014, D. Fleet, T. Pajdla, B. Schiele, and T. Tuytelaars, eds. Cham, Switzerland: Springer, 2014, pp. 740–755.

[17]

T. Y. Lin, P. Goyal, R. Girshick, K. He, and P. Dollár, Focal loss for dense object detection, in Proc. IEEE Int. Conf. Computer Vision (ICCV), Venice, Italy, 2017, pp. 2999–3007.

[18]

R. Jiang, R. Chen, L. Zhang, X. Wang, and Y. Xu, AM-MulFSNet: A fast semantic segmentation network combining attention mechanism and multi-branch, IET Image Process., vol. 18, no. 7, pp. 1733–1744, 2024.

[19]

J. P. S. Schuler, S. Romani, M. Abdel-Nasser, H. Rashwan, and D. Puig, Grouped pointwise convolutions reduce parameters in convolutional neural networks, Mendel, vol. 28, no. 1, pp. 23–31, 2022.

Crossref Google Scholar

[20]

M. Tan, W. Gao, H. Li, J. Xie, and M. Gong, Universal binary neural networks design by improved differentiable neural architecture search, IEEE Trans. Circuits Syst. Video Technol., vol. 34, no. 10, pp. 9153–9165, 2024.

Crossref Google Scholar

[21]

I. Radosavovic, R. P. Kosaraju, R. Girshick, K. He, and P. Dollar, Designing network design spaces, in Proc. IEEE/CVF Conf. Computer Vision and Pattern Recognition (CVPR), Seattle, WA, USA, 2020, pp. 10425–10433.

[22]

W. Wang, X. Zhang, H. Cui, H. Yin, and Y. Zhang, FP-DARTS: Fast parallel differentiable neural architecture search for image classification, Pattern Recognit., vol. 136, p. 109193, 2023.

[23]

C. White, M. Safari, R. Sukthanker, B. Ru, T. Elsken, A. Zela, D. Dey, and F. Hutter, Neural architecture search: Insights from 1000 papers, arXiv preprint arXiv: 2301.08727, 2023.

[24]

H. Liu, K. Simonyan, and Y. Yang, DARTS: Differentiable architecture search, arXiv preprint arXiv: 1806.09055, 2018.

[25]

X. Chen and C. J. Hsieh, Stabilizing differentiable architecture search via perturbation-based regularization, arXiv preprint arXiv: 2002.05283, 2020.

[26]

C. Liu, W. Ding, P. Chen, B. Zhuang, Y. Wang, Y. Zhao, B. Zhang, and Y. Han, RB-net: Training highly accurate and efficient binary neural networks with reshaped point-wise convolution and balanced activation, IEEE Trans. Circuits Syst. Video Technol., vol. 32, no. 9, pp. 6414–6424, 2022.

[27]

X. Ding, Y. Guo, G. Ding, and J. Han, ACNet: Strengthening the kernel skeletons for powerful CNN via asymmetric convolution blocks, in Proc. IEEE/CVF Int. Conf. Computer Vision (ICCV), Seoul, Republic of Korea, 2019, pp. 1911–1920.

[28]

Y. Han, G. Huang, S. Song, L. Yang, H. Wang, and Y. Wang, Dynamic neural networks: A survey, IEEE Trans. Pattern Anal. Mach. Intell., vol. 44, no. 11, pp. 7436–7456, 2022.

Crossref Google Scholar

[29]

G. Huang, D. Chen, T. Li, F. Wu, L. van der Maaten, and K. Q. Weinberger, Multi-scale dense networks for resource efficient image classification, arXiv preprint arXiv: 1703.09844, 2017.

[30]

A. Graves, Adaptive computation time for recurrent neural networks, arXiv preprint arXiv:1603.08983, 2016.

[31]

B. Yang, G. Bender, Q. V. Le, and J. Ngiam, CondConv: Conditionally parameterized convolutions for efficient inference, in Proc. 33rd Int. Conf. Neural Information Processing Systems, Vancouver, Canada, 2019, pp. 1307–1318.

[32]

Y. Li, Y. Chen, X. Dai, D. Chen, M. Liu, L. Yuan, Z. Liu, L. Zhang, and N. Vasconcelos, MicroNet: Improving image recognition with extremely low flops, in Proc. 2021 IEEE/CVF International Conference on Computer Vision (ICCV), Montreal, Canada, 2021, pp. 468–477.

[33]

S. Arora, N. Cohen, and E. Hazan, On the optimization of deep networks: Implicit acceleration by overparameterization, arXiv preprint arXiv: 1802.06509, 2018.

[34]

J. Deng, W. Dong, R. Socher, L. J. Li, L. Kai, and F. F. Li, ImageNet: A large-scale hierarchical image database, in Proc. IEEE Conf. Computer Vision and Pattern Recognition, Miami, FL, USA, 2009, pp. 248–255.

[35]

J. Hu, L. Shen, and G. Sun, Squeeze-and-excitation networks, in Proc. IEEE/CVF Conf. Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 2018, pp. 7132–7141.

[36]

J. Hu, L. Shen, S. Albanie, G. Sun, and A. Vedaldi, Gather-excite: Exploiting feature context in convolutional neural networks, in Proc. 32nd Int. Conf. Neural Information Processing Systems, Montréal Canada, 2018, pp. 9423–9433.

[37]

W. Xu and Y. Wan, ELA: Efficient local attention for deep convolutional neural networks, arXiv preprint arXiv: 2403.01123, 2024.

[38]

Y. Jiang, Z. Jiang, L. Han, Z. Huang, and N. Zheng, MCA: Moment channel attention networks, Proc. AAAI Conf. Artif. Intell., vol. 38, no. 3, pp. 2579–2588, 2024.

[39]

Q. Wang, B. Wu, P. F. Zhu, P. Li, W. Zuo, and Q. Hu, ECA-Net: Efficient channel attention for deep convolutional neural networks, in Proc. 2020 IEEE/CVF Conf. Computer Vision and Pattern Recognition (CVPR), Seattle, WA, USA, 2020, pp. 11531–11539.

[40]

M. D. Zeiler, ADADELTA: An adaptive learning rate method, arXiv preprint arXiv: 1212.5701, 2012.

[41]

M. H. Guo, C. Z. Lu, Z. N. Liu, M. M. Cheng, and S. M. Hu, Visual attention network, Comput. Vis. Medium., vol. 9, no. 4, pp. 733–752, 2023.

Crossref Google Scholar

[42]

I. Loshchilov and F. Hutter, Decoupled weight decay regularization, arXiv preprint arXiv: 1711.05101, 2017.

[43]

W. Wang, E. Xie, X. Li, D. P. Fan, K. Song, D. Liang, T. Lu, P. Luo, and L. Shao, Pyramid vision transformer: A versatile backbone for dense prediction without convolutions, in Proc. IEEE/CVF Int. Conf. Computer Vision (ICCV), Montreal, Canada, 2021, pp. 548–558.

[44]

I. Tolstikhin, N. Houlsby, A. Kolesnikov, L. Beyer, X. Zhai, T. Unterthiner, J. Yung, A. Steiner, D. Keysers, J. Uszkoreit, et al., MLP-Mixer: An all-MLP architecture for vision, arXiv preprint arXiv: 2105.01601, 2021.

[45]

H. Touvron, P. Bojanowski, M. Caron, M. Cord, A. El-Nouby, E. Grave, G. Izacard, A. Joulin, G. Synnaeve, J. Verbeek, et al., ResMLP: Feedforward networks for image classification with data-efficient training, IEEE Trans. Pattern Anal. Mach. Intell., vol. 45, no. 4, pp. 5314–5321, 2023.

Crossref Google Scholar

[46]

A. Trockman and J. Z. Kolter, Patches are all you need? arXiv preprint arXiv: 2201.09792, 2022.

[47]

Z. Chen, Y. Zhu, C. Zhao, G. Hu, W. Zeng, J. Wang, and M. Tang, DPT: Deformable patch-based transformer for visual recognition, arXiv preprint arXiv: 2107.14467, 2021.

[48]

H. Zhao, J. Jia, and V. Koltun, Exploring self-attention for image recognition, in Proc. 2020 IEEE/CVF Conf. Computer Vision and Pattern Recognition (CVPR), Seattle, WA, USA, 2020, pp. 10073–10082.

[49]

I. Bello, LambdaNetworks: Modeling long-range interactions without attention, arXiv preprint arXiv: 2102.08602, 2021.

[50]

Y. Li, K. Zhang, J. Cao, R. Timofte, and L. Van Gool, LocalViT: Bringing locality to vision transformers, arXiv preprint arXiv: 2104.05707, 2021.

[51]

Y. Chen, X. Dai, D. Chen, M. Liu, X. Dong, L. Yuan, and Z. Liu, Mobile-former: Bridging MobileNet and transformer, in Proc. IEEE/CVF Conf. Computer Vision and Pattern Recognition (CVPR), New Orleans, LA, USA, 2022, pp. 5260–5269.

[52]

X. Liu, H. Peng, N. Zheng, Y. Yang, H. Hu, and Y. Yuan, EfficientViT: Memory efficient vision transformer with cascaded group attention, in Proc. IEEE/CVF Conf. Computer Vision and Pattern Recognition (CVPR), Vancouver, Canada, 2023, pp. 14420–14430.

[53]

N. Ibtehaz, N. Yan, M. Mortazavi, and D. Kihara, ACC-ViT: Atrous convolution’s comeback in vision transformers, arXiv preprint arXiv: 2403.04200, 2024.

[54]

X. Liu, C. Hu, and P. Li, Automatic segmentation of overlapped poplar seedling leaves combining mask R-CNN and DBSCAN, Comput. Electron. Agric., vol. 178, pp. 105753, 2020.

Crossref Google Scholar

[55]

Y. Zhu, W. Sun, X. Cao, C. Wang, D. Wu, Y. Yang, and N. Ye, TA-CNN: Two-way attention models in deep convolutional neural network for plant recognition, Neurocomputing, vol. 365, pp. 191–200, 2019.

[56]

Z. Yang, W. He, X. Fan, and T. Tjahjadi, PlantNet: Transfer learning-based fine-grained network for high-throughput plants recognition, Soft Comput. A Fusion Found. Methodol. Appl., vol. 26, no. 20, pp. 10581–10590, 2022.

Crossref Google Scholar

[57]

P. Helber, B. Bischke, A. Dengel, and D. Borth, EuroSAT: A novel dataset and deep learning benchmark for land use and land cover classification, IEEE J. Sel. Top. Appl. Earth Obs. Remote. Sens., vol. 12, no. 7, pp. 2217–2226, 2019.

Crossref Google Scholar

[58]

Z. Cai and N. Vasconcelos, Cascade R-CNN: Delving into high quality object detection, in Proc. 2018 IEEE/CVF Conf. Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 2018, pp. 6154–6162.

[59]

R. R. Selvaraju, M. Cogswell, A. Das, R. Vedantam, D. Parikh, and D. Batra, Grad-CAM: Visual explanations from deep networks via gradient-based localization, in Proc. IEEE Int. Conf. Computer Vision (ICCV), Venice, Italy, 2017, pp. 618–626.

Big Data Mining and Analytics

Volume 8 Issue 3,
May 2025

Pages 606-623

DOI: 10.26599/BDMA.2024.9020087

Cite this article:

Wang J, Guo J, Wang R, et al. Parameter Disentanglement for Diverse Representations. Big Data Mining and Analytics, 2025, 8(3): 606-623. https://doi.org/10.26599/BDMA.2024.9020087

About Us

Learn about Open Access

Tsinghua University Press

Publish with Us

Peer Review Policy

Copyright and Licensing

Article Processing Charge

Contact Us

Journal Collaboration: Yao Meng (Ms.)✉️ +86-10-83470574

Technical Support: Kuo Zhao (Mr.)✉️ +86-10-83470507

Media Contact: Hao Jin (Mr.)✉️ +86-10-83470559

Address: Floor 6, Tower B, Xueyan Building, Shuangqing Road, Haidian District, Beijing 100084, China.

SciOpen——中国科技期刊卓越行动计划支持项目

Copyright © 2025 Tsinghua University Press Ltd.

京ICP备 10035462号-42 京公网安备11010802044758号