PDF (3.1 MB)
Collect
Submit Manuscript
Open Access

Parameter Disentanglement for Diverse Representations

the College of Information Science and Technology & Artificial Intelligence, State Key Laboratory of Tree Genetics and Breeding, and also with the Co-Innovation Center for Sustainable Forestry in Southern China, Nanjing Forestry University, Nanjing 210037, China
the Research Institute of New Technology, Hillstone Networks, Santa Clara, CA 95054, USA
the School of Mathematical and Computational Sciences, Massey University, Auckland 102-904, New Zealand
the Key Laboratory of Knowledge Engineering with Big Data, Ministry of Education, Hefei University of Technology, Hefei 230009, China
the College of Forestry, Hebei Agricultural University, Baoding 071000, China, and also with the Institute of Forest Resource Information Techniques, Chinese Academy of Forestry, Beijing 100091, China
Show Author Information

Abstract

Recent advances in neural network architectures reveal the importance of diverse representations. However, simply integrating more branches or increasing the width for the diversity would inevitably increase model complexity, leading to prohibitive inference costs. In this paper, we revisit the learnable parameters in neural networks and showcase that it is feasible to disentangle learnable parameters to latent sub-parameters, which focus on different patterns and representations. This important finding leads us to study further the aggregation of diverse representations in a network structure. To this end, we propose Parameter Disentanglement for Diverse Representations (PDDR), which considers diverse patterns in parallel during training, and aggregates them into one for efficient inference. To further enhance the diverse representations, we develop a lightweight refinement module in PDDR, which adaptively refines the combination of diverse representations according to the input. PDDR can be seamlessly integrated into modern networks, significantly improving the learning capacity of a network while maintaining the same complexity for inference. Experimental results show great improvements on various tasks, with an improvement of 1.47% over Residual Network 50 (ResNet50) on ImageNet, and we improve the detection results of Retina Residual Network 50 (Retina-ResNet50) by 1.7% Mean Average Precision (mAP). Integrating PDDR into recent lightweight vision transformer models, the resulting model outperforms related works by a clear margin.

References

[1]
K. He, X. Zhang, S. Ren, and J. Sun, Deep residual learning for image recognition, in Proc. IEEE Conf. Computer Vision and Pattern Recognition (CVPR), Las Vegas, NV, USA, 2016, pp. 770–778.
[2]

S. Y. Wang, Z. Qu, and C. J. Li, A dense-aware cross-splitNet for object detection and recognition, IEEE Trans. Circuits Syst. Video Technol., vol. 33, no. 5, pp. 2290–2301, 2023.

[3]
A. Krizhevsky, I. Sutskever, and G. E. Hinton, ImageNet classification with deep convolutional neural networks, Commun. ACM, vol. 60, no. 6, pp. 84–90, 2017.
[4]
K. Simonyan and A. Zisserman, Very deep convolutional networks for large-scale image recognition, arXiv preprint arXiv: 1409.1556, 2014.
[5]
G. Huang, Z. Liu, L. Van Der Maaten, and K. Q. Weinberger, Densely connected convolutional networks, in Proc. IEEE Conf. Computer Vision and Pattern Recognition (CVPR), Honolulu, HI, USA, 2017, pp. 2261–2269.
[6]
C. Szegedy, W. Liu, Y. Jia, P. Sermanet, S. Reed, D. Anguelov, D. Erhan, V. Vanhoucke, and A. Rabinovich, Going deeper with convolutions, in Proc. IEEE Conf. Computer Vision and Pattern Recognition (CVPR), Boston, MA, USA, 2015, pp. 1–9.
[7]
C. Szegedy, V. Vanhoucke, S. Ioffe, J. Shlens, and Z. Wojna, Rethinking the inception architecture for computer vision, in Proc. IEEE Conf. Computer Vision and Pattern Recognition (CVPR), Las Vegas, NV, USA, 2016, pp. 2818–2826.
[8]
S. Xie, R. Girshick, P. Dollár, Z. Tu, and K. He, Aggregated residual transformations for deep neural networks, in Proc. IEEE Conf. Computer Vision and Pattern Recognition (CVPR), Honolulu, HI, USA, 2017, pp. 5987–5995.
[9]
X. Lin, C. Zhao, and W. Pan, Towards accurate binary convolutional neural network, in Proc. 31st Int. Conf. Neural Information Processing Systems, Long Beach, CA, USA, 2017, pp. 344–352.
[10]

Y. Xie, X. Hou, Y. Guo, X. Wang, and J. Zheng, Joint-guided distillation binary neural network via dynamic channel-wise diversity enhancement for object detection, IEEE Trans. Circuits Syst. Video Technol., vol. 34, no. 1, pp. 448–460, 2024.

[11]
M. Sandler, A. Howard, M. Zhu, A. Zhmoginov, and L. C. Chen, MobileNetV2: Inverted residuals and linear bottlenecks, in Proc. IEEE/CVF Conf. Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 2018, pp. 4510–4520.
[12]
M. Tan, B. Chen, R. Pang, V. Vasudevan, M. Sandler, A. Howard, and Q. V. Le, MnasNet: Platform-aware neural architecture search for mobile, in Proc. IEEE/CVF Conf. Computer Vision and Pattern Recognition (CVPR), Long Beach, CA, USA, 2019, pp. 2815–2823.
[13]
W. Yu, M. Luo, P. Zhou, C. Si, Y. Zhou, X. Wang, J. Feng, and S. Yan, MetaFormer is actually what you need for vision, in Proc. IEEE/CVF Conf. Computer Vision and Pattern Recognition (CVPR), New Orleans, LA, USA, 2022, pp. 10809–10819.
[14]

L. Zhang, M. Wang, M. Liu, and D. Zhang, A survey on deep learning for neuroimaging-based brain disorder analysis, Front. Neurosci., vol. 14, pp. 779, 2020.

[15]

G. Zhu, J. Cao, L. Chen, Y. Wang, Z. Bu, S. Yang, J. Wu, and Z. Wang, A multi-task graph neural network with variational graph auto-encoders for session-based travel packages recommendation, ACM Trans. Web, vol. 17, no. 3, pp. 1–30, 2023.

[16]
T. Y. Lin, M. Maire, S. Belongie, J. Hays, P. Perona, D. Ramanan, P. Dollár, and C. L. Zitnick, Microsoft COCO: Common objects in context, in Computer Vision—ECCV 2014, D. Fleet, T. Pajdla, B. Schiele, and T. Tuytelaars, eds. Cham, Switzerland: Springer, 2014, pp. 740–755.
[17]
T. Y. Lin, P. Goyal, R. Girshick, K. He, and P. Dollár, Focal loss for dense object detection, in Proc. IEEE Int. Conf. Computer Vision (ICCV), Venice, Italy, 2017, pp. 2999–3007.
[18]
R. Jiang, R. Chen, L. Zhang, X. Wang, and Y. Xu, AM-MulFSNet: A fast semantic segmentation network combining attention mechanism and multi-branch, IET Image Process., vol. 18, no. 7, pp. 1733–1744, 2024.
[19]

J. P. S. Schuler, S. Romani, M. Abdel-Nasser, H. Rashwan, and D. Puig, Grouped pointwise convolutions reduce parameters in convolutional neural networks, Mendel, vol. 28, no. 1, pp. 23–31, 2022.

[20]

M. Tan, W. Gao, H. Li, J. Xie, and M. Gong, Universal binary neural networks design by improved differentiable neural architecture search, IEEE Trans. Circuits Syst. Video Technol., vol. 34, no. 10, pp. 9153–9165, 2024.

[21]
I. Radosavovic, R. P. Kosaraju, R. Girshick, K. He, and P. Dollar, Designing network design spaces, in Proc. IEEE/CVF Conf. Computer Vision and Pattern Recognition (CVPR), Seattle, WA, USA, 2020, pp. 10425–10433.
[22]
W. Wang, X. Zhang, H. Cui, H. Yin, and Y. Zhang, FP-DARTS: Fast parallel differentiable neural architecture search for image classification, Pattern Recognit., vol. 136, p. 109193, 2023.
[23]
C. White, M. Safari, R. Sukthanker, B. Ru, T. Elsken, A. Zela, D. Dey, and F. Hutter, Neural architecture search: Insights from 1000 papers, arXiv preprint arXiv: 2301.08727, 2023.
[24]
H. Liu, K. Simonyan, and Y. Yang, DARTS: Differentiable architecture search, arXiv preprint arXiv: 1806.09055, 2018.
[25]
X. Chen and C. J. Hsieh, Stabilizing differentiable architecture search via perturbation-based regularization, arXiv preprint arXiv: 2002.05283, 2020.
[26]
C. Liu, W. Ding, P. Chen, B. Zhuang, Y. Wang, Y. Zhao, B. Zhang, and Y. Han, RB-net: Training highly accurate and efficient binary neural networks with reshaped point-wise convolution and balanced activation, IEEE Trans. Circuits Syst. Video Technol., vol. 32, no. 9, pp. 6414–6424, 2022.
[27]
X. Ding, Y. Guo, G. Ding, and J. Han, ACNet: Strengthening the kernel skeletons for powerful CNN via asymmetric convolution blocks, in Proc. IEEE/CVF Int. Conf. Computer Vision (ICCV), Seoul, Republic of Korea, 2019, pp. 1911–1920.
[28]

Y. Han, G. Huang, S. Song, L. Yang, H. Wang, and Y. Wang, Dynamic neural networks: A survey, IEEE Trans. Pattern Anal. Mach. Intell., vol. 44, no. 11, pp. 7436–7456, 2022.

[29]
G. Huang, D. Chen, T. Li, F. Wu, L. van der Maaten, and K. Q. Weinberger, Multi-scale dense networks for resource efficient image classification, arXiv preprint arXiv: 1703.09844, 2017.
[30]
A. Graves, Adaptive computation time for recurrent neural networks, arXiv preprint arXiv:1603.08983, 2016.
[31]
B. Yang, G. Bender, Q. V. Le, and J. Ngiam, CondConv: Conditionally parameterized convolutions for efficient inference, in Proc. 33rd Int. Conf. Neural Information Processing Systems, Vancouver, Canada, 2019, pp. 1307–1318.
[32]
Y. Li, Y. Chen, X. Dai, D. Chen, M. Liu, L. Yuan, Z. Liu, L. Zhang, and N. Vasconcelos, MicroNet: Improving image recognition with extremely low flops, in Proc. 2021 IEEE/CVF International Conference on Computer Vision (ICCV), Montreal, Canada, 2021, pp. 468–477.
[33]
S. Arora, N. Cohen, and E. Hazan, On the optimization of deep networks: Implicit acceleration by overparameterization, arXiv preprint arXiv: 1802.06509, 2018.
[34]
J. Deng, W. Dong, R. Socher, L. J. Li, L. Kai, and F. F. Li, ImageNet: A large-scale hierarchical image database, in Proc. IEEE Conf. Computer Vision and Pattern Recognition, Miami, FL, USA, 2009, pp. 248–255.
[35]
J. Hu, L. Shen, and G. Sun, Squeeze-and-excitation networks, in Proc. IEEE/CVF Conf. Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 2018, pp. 7132–7141.
[36]
J. Hu, L. Shen, S. Albanie, G. Sun, and A. Vedaldi, Gather-excite: Exploiting feature context in convolutional neural networks, in Proc. 32nd Int. Conf. Neural Information Processing Systems, Montréal Canada, 2018, pp. 9423–9433.
[37]
W. Xu and Y. Wan, ELA: Efficient local attention for deep convolutional neural networks, arXiv preprint arXiv: 2403.01123, 2024.
[38]
Y. Jiang, Z. Jiang, L. Han, Z. Huang, and N. Zheng, MCA: Moment channel attention networks, Proc. AAAI Conf. Artif. Intell., vol. 38, no. 3, pp. 2579–2588, 2024.
[39]
Q. Wang, B. Wu, P. F. Zhu, P. Li, W. Zuo, and Q. Hu, ECA-Net: Efficient channel attention for deep convolutional neural networks, in Proc. 2020 IEEE/CVF Conf. Computer Vision and Pattern Recognition (CVPR), Seattle, WA, USA, 2020, pp. 11531–11539.
[40]
M. D. Zeiler, ADADELTA: An adaptive learning rate method, arXiv preprint arXiv: 1212.5701, 2012.
[41]

M. H. Guo, C. Z. Lu, Z. N. Liu, M. M. Cheng, and S. M. Hu, Visual attention network, Comput. Vis. Medium., vol. 9, no. 4, pp. 733–752, 2023.

[42]
I. Loshchilov and F. Hutter, Decoupled weight decay regularization, arXiv preprint arXiv: 1711.05101, 2017.
[43]
W. Wang, E. Xie, X. Li, D. P. Fan, K. Song, D. Liang, T. Lu, P. Luo, and L. Shao, Pyramid vision transformer: A versatile backbone for dense prediction without convolutions, in Proc. IEEE/CVF Int. Conf. Computer Vision (ICCV), Montreal, Canada, 2021, pp. 548–558.
[44]
I. Tolstikhin, N. Houlsby, A. Kolesnikov, L. Beyer, X. Zhai, T. Unterthiner, J. Yung, A. Steiner, D. Keysers, J. Uszkoreit, et al., MLP-Mixer: An all-MLP architecture for vision, arXiv preprint arXiv: 2105.01601, 2021.
[45]

H. Touvron, P. Bojanowski, M. Caron, M. Cord, A. El-Nouby, E. Grave, G. Izacard, A. Joulin, G. Synnaeve, J. Verbeek, et al., ResMLP: Feedforward networks for image classification with data-efficient training, IEEE Trans. Pattern Anal. Mach. Intell., vol. 45, no. 4, pp. 5314–5321, 2023.

[46]
A. Trockman and J. Z. Kolter, Patches are all you need? arXiv preprint arXiv: 2201.09792, 2022.
[47]
Z. Chen, Y. Zhu, C. Zhao, G. Hu, W. Zeng, J. Wang, and M. Tang, DPT: Deformable patch-based transformer for visual recognition, arXiv preprint arXiv: 2107.14467, 2021.
[48]
H. Zhao, J. Jia, and V. Koltun, Exploring self-attention for image recognition, in Proc. 2020 IEEE/CVF Conf. Computer Vision and Pattern Recognition (CVPR), Seattle, WA, USA, 2020, pp. 10073–10082.
[49]
I. Bello, LambdaNetworks: Modeling long-range interactions without attention, arXiv preprint arXiv: 2102.08602, 2021.
[50]
Y. Li, K. Zhang, J. Cao, R. Timofte, and L. Van Gool, LocalViT: Bringing locality to vision transformers, arXiv preprint arXiv: 2104.05707, 2021.
[51]
Y. Chen, X. Dai, D. Chen, M. Liu, X. Dong, L. Yuan, and Z. Liu, Mobile-former: Bridging MobileNet and transformer, in Proc. IEEE/CVF Conf. Computer Vision and Pattern Recognition (CVPR), New Orleans, LA, USA, 2022, pp. 5260–5269.
[52]
X. Liu, H. Peng, N. Zheng, Y. Yang, H. Hu, and Y. Yuan, EfficientViT: Memory efficient vision transformer with cascaded group attention, in Proc. IEEE/CVF Conf. Computer Vision and Pattern Recognition (CVPR), Vancouver, Canada, 2023, pp. 14420–14430.
[53]
N. Ibtehaz, N. Yan, M. Mortazavi, and D. Kihara, ACC-ViT: Atrous convolution’s comeback in vision transformers, arXiv preprint arXiv: 2403.04200, 2024.
[54]

X. Liu, C. Hu, and P. Li, Automatic segmentation of overlapped poplar seedling leaves combining mask R-CNN and DBSCAN, Comput. Electron. Agric., vol. 178, pp. 105753, 2020.

[55]
Y. Zhu, W. Sun, X. Cao, C. Wang, D. Wu, Y. Yang, and N. Ye, TA-CNN: Two-way attention models in deep convolutional neural network for plant recognition, Neurocomputing, vol. 365, pp. 191–200, 2019.
[56]

Z. Yang, W. He, X. Fan, and T. Tjahjadi, PlantNet: Transfer learning-based fine-grained network for high-throughput plants recognition, Soft Comput. A Fusion Found. Methodol. Appl., vol. 26, no. 20, pp. 10581–10590, 2022.

[57]

P. Helber, B. Bischke, A. Dengel, and D. Borth, EuroSAT: A novel dataset and deep learning benchmark for land use and land cover classification, IEEE J. Sel. Top. Appl. Earth Obs. Remote. Sens., vol. 12, no. 7, pp. 2217–2226, 2019.

[58]
Z. Cai and N. Vasconcelos, Cascade R-CNN: Delving into high quality object detection, in Proc. 2018 IEEE/CVF Conf. Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 2018, pp. 6154–6162.
[59]
R. R. Selvaraju, M. Cogswell, A. Das, R. Vedantam, D. Parikh, and D. Batra, Grad-CAM: Visual explanations from deep networks via gradient-based localization, in Proc. IEEE Int. Conf. Computer Vision (ICCV), Venice, Italy, 2017, pp. 618–626.
Big Data Mining and Analytics
Pages 606-623
Cite this article:
Wang J, Guo J, Wang R, et al. Parameter Disentanglement for Diverse Representations. Big Data Mining and Analytics, 2025, 8(3): 606-623. https://doi.org/10.26599/BDMA.2024.9020087
Metrics & Citations  
Article History
Copyright
Rights and Permissions
Return