Parameter Disentanglement for Diverse Representations

Jingxu Wang, Jingda Guo, Ruili Wang, Zhao Zhang, Liyong Fu, Qiaolin Ye()

1. College of Information Science and Technology, College of Artificial Intelligence, State Key Laboratory of Tree Genetics and Breeding, Co-Innovation Center for Sustainable Forestry in Southern China, Nanjing Forestry.

2. Research Institute of New Technology, Hillstone Networks, Santa Clara 95054, USA.

3. School of Mathematical and Computational Sciences, Massey University, Auckland 102-904, New Zealand.
4. Key Laboratory of Knowledge Engineering with Big Data, Ministry of Education, Hefei University of Technology, Hefei 230009, China.
5. College of Forestry, Hebei Agricultural University, Baoding 071000, China, Institute of Forest Resource Information Techniques, Chinese Academy of Forestry, Beijing 100091, China.

Show Author Information

Abstract

Recent advances in neural network architectures reveal the importance of diverse representations. However, simply integrating more branches or increasing the width for the diversity would inevitably increase model complexity, leading to prohibitive inference costs. In this paper, we revisit the learnable parameters in neural networks and showcase that it is feasible to disentangle learnable parameters to latent sub-parameters, which focus on different patterns and representations. This important finding leads us to study further the aggregation of diverse representations in a network structure. To this end, we propose Parameter Disentanglement for Diverse Representations (PDDR), which considers diverse patterns in parallel during training, and aggregates them into one for efficient inference. To further enhance the diverse representations, we develop a lightweight refinement module in PDDR, which adaptively refines the combination of diverse representations according to the input. PDDR can be seamlessly integrated into modern networks, significantly improving the learning capacity of a network while maintaining the same complexity for inference. Experimental results show great improvements on various tasks, with an improvement of 1.47% over Residual Network 50 (ResNet50) on ImageNet, and we improve the detection results of Retina Residual Network 50 (Retina ResNet50) by 1.7% mean Average Precision (mAP). Integrating PDDR into recent lightweight vision transformer models, the resulting model outperforms related works by a clear margin. The code is available at: PDDR

Keywords

representation learning efficient network computer vision

Big Data Mining and Analytics

Cite this article:

Wang J, Guo J, Wang R, et al. Parameter Disentanglement for Diverse Representations. Big Data Mining and Analytics, 2024, https://doi.org/10.26599/BDMA.2024.9020087