AI Chat Paper
Note: Please note that the following content is generated by AMiner AI. SciOpen does not take any responsibility related to this content.
{{lang === 'zh_CN' ? '文章概述' : 'Summary'}}
{{lang === 'en_US' ? '中' : 'Eng'}}
Chat more with AI
PDF (10.3 MB)
Collect
Submit Manuscript AI Chat Paper
Show Outline
Outline
Show full outline
Hide outline
Outline
Show full outline
Hide outline
Article | Open Access

Bilateral Reference for High-Resolution Dichotomous Image Segmentation

Peng Zheng1,2Dehong Gao3Deng-Ping Fan1( )Li Liu4Jorma Laaksonen5Wanli Ouyang2Nicu Sebe6
College of Computer Science, Nankai University, Tianjin 300350, China
Shanghai AI Laboratory, Shanghai 200232, China
School of Cybersecurity, Northwestern Polytechnical University, Xi’an 710072, China
College of Electronic Science and Technology, National University of Defense Technology, Changsha 410073, China
Department of Computer Science, Aalto University, Espoo FI-02150, Finland
Department of Information Engineering and Computer Science, University of Trento, Trento I-38122, Italy
Show Author Information

Abstract

We introduce a novel bilateral reference framework (BiRefNet) for high-resolution dichotomous image segmentation (DIS). It comprises two essential components: the localization module (LM) and the reconstruction module (RM) with our proposed bilateral reference (BiRef). LM aids in object localization using global semantic information. Within the RM, we utilize BiRef for the reconstruction process, where hierarchical patches of images provide the source reference, and gradient maps serve as the target reference. These components collaborate to generate the final predicted maps. We also introduce auxiliary gradient supervision to enhance the focus on regions with finer details. In addition, we outline practical training strategies tailored for DIS to improve map quality and the training process. To validate the general applicability of our approach, we conduct extensive experiments on four tasks to evince that BiRefNet exhibits remarkable performance, outperforming task-specific cutting-edge methods across all benchmarks. Our codes are publicly available at https://github.com/ZhengPeng7/BiRefNet.

References

[1]

D. P. Fan, J. Zhang, G. Xu, M. M. Cheng, and L. Shao, Salient objects in clutter, IEEE Trans. Pattern Anal. Mach. Intell., vol. 45, no. 2, pp. 2344–2366, 2023.

[2]

D. P. Fan, G. P. Ji, P. Xu, M. M. Cheng, C. Sakaridis, and L. Van Gool, Advances in deep concealed scene understanding, Vis. Intell., vol. 1, no. 1, pp. 16, 2023.

[3]

D. P. Fan, G. P. Ji, M. M. Cheng, and L. Shao, Concealed object detection, IEEE Trans. Pattern Anal. Mach. Intell., vol. 44, no. 10, pp. 6024–6042, 2022.

[4]
X. Qin, H. Dai, X. Hu, D. P. Fan, L. Shao, and L. Van Gool, Highly accurate dichotomous image segmentation, in Proc. 17th European Conf. Computer Vision (ECCV), Tel Aviv, Israel, 2022, pp. 38–56.
[5]

W. Lu, W. Sun, X. Min, W. Zhu, Q. Zhou, J. He, Q. Wang, Z. Zhang, T. Wang, and G. Zhai, Deep neural network for blind visual quality assessment of 4K content, IEEE Trans. Broadcast., vol. 69, no. 2, pp. 406–421, 2023.

[6]

W. Sun, X. Min, D. Tu, S. Ma, and G. Zhai, Blind quality assessment for in-the-wild images via hierarchical feature fusion and iterative mixed database training, IEEE J. Sel. Top. Signal Process., vol. 17, no. 6, pp. 1178–1192, 2023.

[7]

W. Sun, X. Min, G. Zhai, K. Gu, H. Duan, and S. Ma, MC360IQA: A multi-channel CNN for blind 360-degree image quality assessment, IEEE J. Sel. Top. Signal Process., vol. 14, no. 1, pp. 64–77, 2020.

[8]
W. Sun, X. Min, W. Lu, and G. Zhai, A deep learning based No-reference quality assessment model for UGC videos, in Proc. 30th ACM Int. Conf. Multimedia, Lisboa, Portugal, 2022, pp. 856–865.
[9]
Y. Zhou, B. Dong, Y. Wu, W. Zhu, G. Chen, and Y. Zhang, Dichotomous image segmentation with frequency priors, in Proc. 32nd Int. Joint Conf. Artificial Intelligence, Macao, China, 2023, pp. 1822–1830.
[10]
J. Pei, Z. Zhou, Y. Jin, H. Tang, and P. A. Heng, Unite-divide-unite: Joint boosting trunk and structure for high-accuracy dichotomous image segmentation, in Proc. 31st ACM Int. Conf. Multimedia, Ottawa, Canada, 2023, pp. 2139–2147.
[11]
T. Kim, K. Kim, J. Lee, D. Cha, J. Lee, and D. Kim, Revisiting image pyramid structure for high resolution salient object detection, in Proc. 16th Asian Conf. Computer Vision (ACCV), Macao, China, 2022, pp. 257–273,
[12]
X. Li, J. Yang, S. Li, J. Lei, J. Zhang, and D. Chen, Locate, refine and restore: A progressive enhancement network for camouflaged object detection, in Proc. 32nd Int. Joint Conf. Artificial Intelligence, Macao, China, 2023, pp. 1116–1124.
[13]
D. P. Fan, M. M. Cheng, Y. Liu, T. Li, and A. Borji, Structure-measure: A new way to evaluate foreground maps, in in Proc. 2017 IEEE Int. Conf. Computer Vision (ICCV), Venice, Italy, 2017, pp. 4558–4567.
[14]
Y. Zeng, P. Zhang, Z. Lin, J. Zhang, and H. Lu, Towards high-resolution salient object detection, in in Proc. 2019 IEEE/CVF Int. Conf. Computer Vision (ICCV), Seoul, Republic of Korea, 2019, pp. 7233–7242.
[15]
C. Xie, C. Xia, M. Ma, Z. Zhao, X. Chen, and J. Li, Pyramid grafting network for one-stage high resolution saliency detection, in Proc. 2022 IEEE/CVF Conf. Computer Vision and Pattern Recognition (CVPR), New Orleans, LA, USA, 2022, pp. 11707–11716.
[16]
X. Deng, P. Zhang, W. Liu, and H. Lu, Recurrent multi-scale transformer for high-resolution salient object detection, in Proc. 31st ACM Int. Conf. Multimedia, Ottawa, Canada, 2023, pp. 7413–7423.
[17]
L. Tang, B. Li, Y. Zhong, S. Ding, and M. Song, Disentangled high quality salient object detection, in Proc. 2021 IEEE/CVF Conf. Computer Vision and Pattern Recognition (CVPR), Nashville, TN, USA, 2021, pp. 3560–3570.
[18]
K. He, X. Zhang, S. Ren, and J. Sun, Deep residual learning for image recognition, in in Proc. 2016 IEEE Conf. Computer Vision and Pattern Recognition (CVPR), Las Vegas, NV, USA, 2016, pp. 770–778.
[19]
Z. Liu, Y. Lin, Y. Cao, H. Hu, Y. Wei, Z. Zhang, S. Lin, and B. Guo, Swin transformer: Hierarchical vision transformer using shifted windows, in Proc. 2021 IEEE/CVF Int. Conf. Computer Vision (ICCV), Montreal, Canada, 2021, pp. 9992–10002.
[20]
D. P. Fan, G. P. Ji, G. Sun, M. M. Cheng, J. Shen, and L. Shao, Camouflaged object detection, in Proc. 2020 IEEE/CVF Conf. Computer Vision and Pattern Recognition (CVPR), Seattle, WA, USA, 2020, pp. 2774–2784.
[21]
Y. Zhong, B. Li, L. Tang, S. Kuang, S. Wu, and S. Ding, Detecting camouflaged object in frequency domain, in Proc. 2022 IEEE/CVF Conf. Computer Vision and Pattern Recognition (CVPR), New Orleans, LA, USA, 2022, 4494–4503.
[22]
Y. Sun, S. Wang, C. Chen, and T. Z. Xiang, Boundary-guided camouflaged object detection, in Proc. 31st Int. Joint Conf. Artificial Intelligence, Vienna, Austria, 2022, pp. 1335–1341.
[23]

G. P. Ji, D. P. Fan, Y. C. Chou, D. Dai, A. Liniger, and L. Van Gool, Deep gradient learning for efficient camouflaged object detection, Mach. Intell. Res., vol. 20, no. 1, pp. 92–108, 2023.

[24]
Z. Huang, H. Dai, T. Z. Xiang, S. Wang, H. X. Chen, J. Qin, and H. Xiong, Feature shrinkage pyramid for camouflaged object detection with transformers, in Proc. 2023 IEEE/CVF Conf. Computer Vision and Pattern Recognition (CVPR), Vancouver, Canada, 2023, pp. 5557–5566.
[25]
X. Hu, S. Wang, X. Qin, H. Dai, W. Ren, D. Luo, Y. Tai, and L. Shao, High-resolution iterative feedback network for camouflaged object detection, in Proc. 37 th AAAI Conf. Artificial Intelligence, Washington, DC, USA, 2023, pp. 881–889.
[26]
B. Yin, X. Zhang, Q. Hou, B. Y. Sun, D. P. Fan, and L. Van Gool, CamoFormer: Masked separable attention for camouflaged object detection, arXiv preprint arXiv: 2212.06570, 2022.
[27]
J. Wei, S. Wang, Z. Wu, C. Su, Q. Huang, and Q. Tian, Label decoupling framework for salient object detection, in Proc. 2020 IEEE/CVF Conf. Computer Vision and Pattern Recognition (CVPR), Seattle, WA, USA, 2020, pp. 13022–13031.
[28]
Q. Yu, X. Zhao, Y. Pang, L. Zhang, and H. Lu, Multi-view aggregation network for dichotomous image segmentation, arXiv preprint arXiv: 2404.07445, 2024.
[29]

J. Li, J. Zhang, S. J. Maybank, and D. Tao, Bridging composite and real: Towards end-to-end deep image matting, Int. J. Comput. Vis., vol. 130, no. 2, pp. 246–266, 2022.

[30]
N. Xu, B. Price, S. Cohen, and T. Huang, Deep image matting, in Proc. 2017 IEEE Conf. Computer Vision and Pattern Recognition (CVPR), Honolulu, HI, USA, 2017, pp. 311–320.
[31]
O. Ronneberger, P. Fischer, and T. Brox, U-net: Convolutional networks for biomedical image segmentation, in Proc. 18 th Int. Conf. Medical Image Computing and Computer Assisted Intervention (MICCAI 2015), Munich, Germany, 2015, pp. 234–241.
[32]
H. Zhao, X. Qi, X. Shen, J. Shi, and J. Jia, ICNet for real-time semantic segmentation on high-resolution images, arXiv preprint arXiv: 1704.08545, 2017.
[33]

W. I. Grosky and R. Jain, A pyramid-based approach to segmentation applied to region matching, IEEE Trans. Pattern Anal. Mach. Intell., vol. PAMI-8, no. 5, pp. 639–650, 1986.

[34]
H. Zhao, J. Shi, X. Qi, X. Wang, and J. Jia, Pyramid scene parsing network, in Proc. 2017 IEEE Conf. Computer Vision and Pattern Recognition (CVPR), Honolulu, HI, USA, 2017, pp. 6230–6239.
[35]
Q. Yu, J. Zhang, H. Zhang, Y. Wang, Z. Lin, N. Xu, Y. Bai, and A. Yuille, Mask guided matting via progressive refinement network, in Proc. 2021 IEEE/CVF Conf. Computer Vision and Pattern Recognition (CVPR), Nashville, TN, USA, 2021, pp. 1154–1163.
[36]
X. Qin, Z. Zhang, C. Huang, C. Gao, M. Dehghan, and M. Jagersand, BASNet: Boundary-aware salient object detection, in Proc. 2019 IEEE/CVF Conf. Computer Vision and Pattern Recognition (CVPR), Long Beach, CA, USA, 2019, pp. 7471–7481.
[37]
T. Shen, Y. Zhang, L. Qi, J. Kuen, X. Xie, J. Wu, Z. Lin, and J. Jia, High quality segmentation for ultra high-resolution images, in Proc. 2022 IEEE/CVF Conf. Computer Vision and Pattern Recognition (CVPR), New Orleans, LA, USA, 2022, pp. 1300–1309.
[38]
C. He, K. Li, Y. Zhang, Y. Zhang, Z. Guo, X. Li, M. Danelljan, and F. Yu, Strategic preys make acute predators: Enhancing camouflaged object detectors by generating camouflaged objects, in Proc. 12th Int. Conf. Learning Representations (ICLR), Vienna, Austria, 2024.
[39]
C. Tang, H. Chen, X. Li, J. Li, Z. Zhang, and X. Hu, Look closer to segment better: Boundary patch refinement for instance segmentation, in Proc. 2021 IEEE/CVF Conf. Computer Vision and Pattern Recognition (CVPR), Nashville, TN, USA, 2021, pp. 13921–13930.
[40]
W. S. Lai, J. B. Huang, N. Ahuja, and M. H. Yang, Deep Laplacian pyramid networks for fast and accurate super-resolution, in Proc. 2017 IEEE Conf. Computer Vision and Pattern Recognition (CVPR), Honolulu, HI, USA, 2017, pp. 5835–5843.
[41]

W. S. Lai, J. B. Huang, N. Ahuja, and M. H. Yang, Fast and accurate image super-resolution with deep Laplacian pyramid networks, IEEE Trans. Pattern Anal. Mach. Intell., vol. 41, no. 11, pp. 2599–2613, 2019.

[42]
M. Yang, K. Yu, C. Zhang, Z. Li, and K. Yang, DenseASPP for semantic segmentation in street scenes, in Proc. 2018 IEEE/CVF Conf. Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 2018, pp. 3684–3692.
[43]
L. C. Chen, Y. Zhu, G. Papandreou, F. Schroff, and H. Adam, Encoder-decoder with atrous separable convolution for semantic image segmentation, in Proc. 15th European Conf. Computer Vision (ECCV), Munich, Germany, 2018, pp. 833–851.
[44]
J. Dai, H. Qi, Y. Xiong, Y. Li, G. Zhang, H. Hu, and Y. Wei, Deformable convolutional networks, in Proc. 2017 IEEE Int. Conf. Computer Vision (ICCV), Venice, Italy, 2017, pp. 764–773.
[45]
T. Y. Lin, P. Dollar, R. Girshick, K. He, B. Hariharan, and S. Belongie, Feature pyramid networks for object detection, in Proc. 2017 IEEE Conf. Computer Vision and Pattern Recognition (CVPR), Honolulu, HI, USA, 2017, pp. 936–944.
[46]
R. Achanta, S. Hemami, F. Estrada, and S. Susstrunk, Frequency-tuned salient region detection, in Proc. 2009 IEEE Conf. Computer Vision and Pattern Recognition, Miami, FL, USA, 2009, pp. 1597–1604.
[47]
Z. Zhang, W. Jin, J. Xu, and M. M. Cheng, Gradient-induced co-saliency detection, in Proc. 16th European Conf. Computer Vision (ECCV), Glasgow, UK, 2020, pp. 455–472.
[48]

T. N. Le, T. V. Nguyen, Z. Nie, M. T. Tran, and A. Sugimoto, Anabranch network for camouflaged object segmentation, Comput. Vis. Image Underst., vol. 184, pp. 45–56, 2019.

[49]
Y. Lv, J. Zhang, Y. Dai, A. Li, B. Liu, N. Barnes, and D. P. Fan, Simultaneously localize, segment and rank the camouflaged objects, in Proc. 2021 IEEE/CVF Conf. Computer Vision and Pattern Recognition (CVPR), Nashville, TN, USA, 2021, pp. 11586–11596.
[50]
L. Wang, H. Lu, Y. Wang, M. Feng, D. Wang, B. Yin, and X. Ruan, Learning to detect salient objects with image-level supervision, in Proc. 2017 IEEE Conf. Computer Vision and Pattern Recognition (CVPR), Honolulu, HI, USA, 2017, pp. 11586–11596.
[51]
C. Yang, L. Zhang, H. Lu, X. Ruan, and M. H. Yang, Saliency detection via graph-based manifold ranking, in Proc. 2023 IEEE/CVF Conf. Computer Vision and Pattern Recognition (CVPR), Vancouver, Canada, 2023, pp. 3166–3173.
[52]
D. P. Fan, C. Gong, Y. Cao, B. Ren, M. M. Cheng, and A. Borji, Enhanced-alignment measure for binary foreground map evaluation, in Proc. 27th Int. Joint Conf. Artificial Intelligence, Stockholm, Sweden, 2018, pp. 698–704.
[53]

A. Borji, M. M. Cheng, H. Jiang, and J. Li, Salient object detection: A benchmark, IEEE Trans. Image Process., vol. 24, no. 12, pp. 5706–5722, 2015.

[54]
D. P. Kingma and J. Ba, Adam: A method for stochastic optimization, in Proc. 3rd Int. Conf. Learning Representations (ICLR), San Diego, CA, USA, 2015.
[55]
A. Paszke, S. Gross, F. Massa, A. Lerer, J. Bradbury, G. Chanan, T. Killeen, Z. Lin, N. Gimelshein, L. Antiga, et al. , PyTorch: An imperative style, high-performance deep learning library, in Proc. 33rd Conf. Neural Information Processing Systems (NeurIPS 2019), Vancouver, Canada, 2019, pp. 8026–8037.
[56]

X. Qin, Z. Zhang, C. Huang, M. Dehghan, O. R. Zaiane, and M. Jagersand, U2-Net: Going deeper with nested U-structure for salient object detection, Pattern Recognit., vol. 106, pp. 107404, 2020.

[57]

J. Wang, K. Sun, T. Cheng, B. Jiang, C. Deng, Y. Zhao, D. Liu, Y. Mu, M. Tan, X. Wang et al., Deep high-resolution representation learning for visual recognition, IEEE Trans. Pattern Anal. Mach. Intell., vol. 43, no. 10, pp. 3349–3364, 2021.

[58]
Q. Jia, S. Yao, Y. Liu, X. Fan, R. Liu, and Z. Luo, Segment, magnify and reiterate: Detecting camouflaged objects the hard way, in Proc. 2022 IEEE/CVF Conf. Computer Vision and Pattern Recognition (CVPR), New Orleans, LA, USA, 2022, pp. 4703–4712.
[59]
Y. Pang, X. Zhao, T. -Z. Xiang, L. Zhang, and H. Lu, Zoom in and out: A mixed-scale triplet network for camouflaged object detection, in Proc. 2022 IEEE/CVF Conf. Computer Vision and Pattern Recognition (CVPR), New Orleans, LA, USA, 2022, pp. 2150–2160.
[60]
C. He, K. Li, Y. Zhang, L. Tang, Y. Zhang, Z. Guo, and X. Li, Camouflaged object detection with feature decomposition and edge reconstruction, in Proc. 2023 IEEE/CVF Conf. Computer Vision and Pattern Recognition (CVPR), Vancouver, Canada, 2023, pp. 22046–22055.
[61]

Q. Zou, Z. Zhang, Q. Li, X. Qi, Q. Wang, and S. Wang, DeepCrack: learning hierarchical convolutional features for crack detection, IEEE Trans. Image Process., vol. 28, no. 3, pp. 1498–1512, 2019.

[62]
T. Y. Lin, M. Maire, S. J. Belongie, J. Hays, P. Perona, D. Ramanan, P. Dollár, and C. L. Zitnick, Microsoft COCO: Common objects in context, in Proc. 13th European Conf. Computer Vision (ECCV), Zurich, Switzerland, 2016, pp. 740–755.
[63]

L. Dai, X. Song, X. Liu, C. Li, Z. Shi, J. Chen, and M. Brooks, Enabling trimap-free image matting with a frequency-guided saliency-aware network via joint learning, IEEE Trans. Multimedia, vol. 25, pp. 4868–4879, 2023.

CAAI Artificial Intelligence Research
Article number: 9150038
Cite this article:
Zheng P, Gao D, Fan D-P, et al. Bilateral Reference for High-Resolution Dichotomous Image Segmentation. CAAI Artificial Intelligence Research, 2024, 3: 9150038. https://doi.org/10.26599/AIR.2024.9150038

1094

Views

1142

Downloads

1

Crossref

Altmetrics

Received: 19 April 2024
Revised: 09 July 2024
Accepted: 23 July 2024
Published: 22 August 2024
© The author(s) 2024.

The articles published in this open access journal are distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/).

Return