We introduce a novel bilateral reference framework (BiRefNet) for high-resolution dichotomous image segmentation (DIS). It comprises two essential components: the localization module (LM) and the reconstruction module (RM) with our proposed bilateral reference (BiRef). LM aids in object localization using global semantic information. Within the RM, we utilize BiRef for the reconstruction process, where hierarchical patches of images provide the source reference, and gradient maps serve as the target reference. These components collaborate to generate the final predicted maps. We also introduce auxiliary gradient supervision to enhance the focus on regions with finer details. In addition, we outline practical training strategies tailored for DIS to improve map quality and the training process. To validate the general applicability of our approach, we conduct extensive experiments on four tasks to evince that BiRefNet exhibits remarkable performance, outperforming task-specific cutting-edge methods across all benchmarks. Our codes are publicly available at https://github.com/ZhengPeng7/BiRefNet.
D. P. Fan, J. Zhang, G. Xu, M. M. Cheng, and L. Shao, Salient objects in clutter, IEEE Trans. Pattern Anal. Mach. Intell., vol. 45, no. 2, pp. 2344–2366, 2023.
D. P. Fan, G. P. Ji, P. Xu, M. M. Cheng, C. Sakaridis, and L. Van Gool, Advances in deep concealed scene understanding, Vis. Intell., vol. 1, no. 1, pp. 16, 2023.
D. P. Fan, G. P. Ji, M. M. Cheng, and L. Shao, Concealed object detection, IEEE Trans. Pattern Anal. Mach. Intell., vol. 44, no. 10, pp. 6024–6042, 2022.
W. Lu, W. Sun, X. Min, W. Zhu, Q. Zhou, J. He, Q. Wang, Z. Zhang, T. Wang, and G. Zhai, Deep neural network for blind visual quality assessment of 4K content, IEEE Trans. Broadcast., vol. 69, no. 2, pp. 406–421, 2023.
W. Sun, X. Min, D. Tu, S. Ma, and G. Zhai, Blind quality assessment for in-the-wild images via hierarchical feature fusion and iterative mixed database training, IEEE J. Sel. Top. Signal Process., vol. 17, no. 6, pp. 1178–1192, 2023.
W. Sun, X. Min, G. Zhai, K. Gu, H. Duan, and S. Ma, MC360IQA: A multi-channel CNN for blind 360-degree image quality assessment, IEEE J. Sel. Top. Signal Process., vol. 14, no. 1, pp. 64–77, 2020.
G. P. Ji, D. P. Fan, Y. C. Chou, D. Dai, A. Liniger, and L. Van Gool, Deep gradient learning for efficient camouflaged object detection, Mach. Intell. Res., vol. 20, no. 1, pp. 92–108, 2023.
J. Li, J. Zhang, S. J. Maybank, and D. Tao, Bridging composite and real: Towards end-to-end deep image matting, Int. J. Comput. Vis., vol. 130, no. 2, pp. 246–266, 2022.
W. I. Grosky and R. Jain, A pyramid-based approach to segmentation applied to region matching, IEEE Trans. Pattern Anal. Mach. Intell., vol. PAMI-8, no. 5, pp. 639–650, 1986.
W. S. Lai, J. B. Huang, N. Ahuja, and M. H. Yang, Fast and accurate image super-resolution with deep Laplacian pyramid networks, IEEE Trans. Pattern Anal. Mach. Intell., vol. 41, no. 11, pp. 2599–2613, 2019.
T. N. Le, T. V. Nguyen, Z. Nie, M. T. Tran, and A. Sugimoto, Anabranch network for camouflaged object segmentation, Comput. Vis. Image Underst., vol. 184, pp. 45–56, 2019.
A. Borji, M. M. Cheng, H. Jiang, and J. Li, Salient object detection: A benchmark, IEEE Trans. Image Process., vol. 24, no. 12, pp. 5706–5722, 2015.
X. Qin, Z. Zhang, C. Huang, M. Dehghan, O. R. Zaiane, and M. Jagersand, U2-Net: Going deeper with nested U-structure for salient object detection, Pattern Recognit., vol. 106, pp. 107404, 2020.
J. Wang, K. Sun, T. Cheng, B. Jiang, C. Deng, Y. Zhao, D. Liu, Y. Mu, M. Tan, X. Wang et al., Deep high-resolution representation learning for visual recognition, IEEE Trans. Pattern Anal. Mach. Intell., vol. 43, no. 10, pp. 3349–3364, 2021.
Q. Zou, Z. Zhang, Q. Li, X. Qi, Q. Wang, and S. Wang, DeepCrack: learning hierarchical convolutional features for crack detection, IEEE Trans. Image Process., vol. 28, no. 3, pp. 1498–1512, 2019.
L. Dai, X. Song, X. Liu, C. Li, Z. Shi, J. Chen, and M. Brooks, Enabling trimap-free image matting with a frequency-guided saliency-aware network via joint learning, IEEE Trans. Multimedia, vol. 25, pp. 4868–4879, 2023.