AI Chat Paper
Note: Please note that the following content is generated by AMiner AI. SciOpen does not take any responsibility related to this content.
{{lang === 'zh_CN' ? '文章概述' : 'Summary'}}
{{lang === 'en_US' ? '中' : 'Eng'}}
Chat more with AI
PDF (2.6 MB)
Collect
Submit Manuscript AI Chat Paper
Show Outline
Outline
Show full outline
Hide outline
Outline
Show full outline
Hide outline
Article | Open Access

Polyp-PVT: Polyp Segmentation with Pyramid Vision Transformers

Bo Dong1Wenhai Wang2Deng-Ping Fan1( )Jinpeng Li3Huazhu Fu4Ling Shao5
College of Computer Science, Nankai University, Tianjin 300350, China
Shanghai Artificial Intelligence Laboratory, Shanghai 200232, China
Computer Vision Lab, Inception Institute of Artificial Intelligence, Abu Dhabi, United Arab Emirates
Institute of High Performance Computing, Agency for Science, Technology and Research, Singapore 138632, Singapore
UCAS-Terminus AI Lab, Terminus Group, Chongqing 400042, China
Show Author Information

Abstract

Most polyp segmentation methods use convolutional neural networks (CNNs) as their backbone, leading to two key issues when exchanging information between the encoder and decoder: (1) taking into account the differences in contribution between different-level features, and (2) designing an effective mechanism for fusing these features. Unlike existing CNN-based methods, we adopt a transformer encoder, which learns more powerful and robust representations. In addition, considering the image acquisition influence and elusive properties of polyps, we introduce three standard modules, including a cascaded fusion module (CFM), a camouflage identification module (CIM), and a similarity aggregation module (SAM). Among these, the CFM is used to collect the semantic and location information of polyps from high-level features; the CIM is applied to capture polyp information disguised in low-level features, and the SAM extends the pixel features of the polyp area with high-level semantic position information to the entire polyp area, thereby effectively fusing cross-level features. The proposed model, named Polyp-PVT, effectively suppresses noises in the features and significantly improves their expressive capabilities. Extensive experiments on five widely adopted datasets show that the proposed model is more robust to various challenging situations (e.g., appearance changes, small objects, and rotation) than existing representative methods. The proposed model is available at https://github.com/DengPingFan/Polyp-PVT.

References

[1]

M. Fiori, P. Musé, and G. Sapiro, A complete system for candidate polyps detection in virtual colonoscopy, Int. J. Patt. Recogn. Artif. Intell., vol. 28, no. 7, p. 1460014, 2014.

[2]

A. V. Mamonov, I. N. Figueiredo, P. N. Figueiredo, and Y. H. Richard Tsai, Automated polyp detection in colon capsule endoscopy, IEEE Trans. Med. Imag., vol. 33, no. 7, pp. 1488–1502, 2014.

[3]
O. H. Maghsoudi. Superpixel based segmentation andclassification of polyps in wireless capsule endoscopy, in Proc. 2017 IEEE Signal Processing in Medicine and Biology Symposium (SPMB), Philadelphia, PA, USA, 2017, pp. 1–4.
[4]
O. Ronneberger, P. Fischer, and T. Brox, U-net: Convolutional networks for biomedical image segmentation, in Proc. 18th Int. Conf. Medical Image Computing and Computer Assisted Intervention, Munich, Germany, 2015, pp. 234–241.
[5]
D. P. Fan, G. P. Ji, T. Zhou, G. Chen, H. Fu, J. Shen, and L. Shao, PraNet: parallel reverse attention network for polyp segmentation, in Proc 23th Int. Conf. Medical Image Computing and Computer Assisted Intervention, Lima, Peru, 2020, pp. 263–273.
[6]

X. Guo, C. Yang, Y. Liu, and Y. Yuan, Learn to threshold: ThresholdNet with confidence-guided manifold mixup for polyp segmentation, IEEE Trans. Med. Imag., vol. 40, no. 4, pp. 1134–1146, 2021.

[7]
J. Wei, Y. Hu, R. Zhang, Z. Li, S. K. Zhou, and S. Cui, Shallow attention network for polyp segmentation, arXiv preprint arXiv: 2108.00882, 2021.
[8]

J. Bernal, F. J. Sánchez, G. Fernández-Esparrach, D. Gil, C. Rodríguez, and F. Vilariño, WM-DOVA maps for accurate polyp highlighting in colonoscopy: Validation vs. saliency maps from physicians, Comput. Med. Imag. Graph., vol. 43, pp. 99–111, 2015.

[9]

J. Silva, A. Histace, O. Romain, X. Dray, and B. Granado, Toward embedded detection of polyps in WCE images for early diagnosis of colorectal cancer, Int. J. Comput. Assist. Radiol. Surg., vol. 9, no. 2, pp. 283–293, 2014.

[10]

N. Tajbakhsh, S. R. Gurudu, and J. Liang, Automated polyp detection in colonoscopy videos using shape and context information, IEEE Trans. Med. Imag., vol. 35, no. 2, pp. 630–644, 2016.

[11]

D. P. Fan, G. P. Ji, M. M. Cheng, and L. Shao, Concealed object detection, IEEE Trans. Pattern Anal. Mach. Intell, vol. 44, no. 10, p. 6024, 6042.

[12]
D. P. Fan, G. P. Ji, M. M. Cheng, and L. Shao, Concealed object detection, in Proc. 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Seattle, WA, USA, 2020, pp. 2774–2784.
[13]
D. Jha, P. H. Smedsrud, M. A. Riegler, P. Halvorsen, T. de Lange, D. Johansen, and H. D. Johansen, Kvasir-SEG: A segmented polyp dataset, in Proc. 26th Int. Conf. Multimedia Modeling, Daejeon, Korea, 2020, pp. 451–462.
[14]

D. Vázquez, J. Bernal, F. J. Sánchez, G. Fernández-Esparrach, A. M. López, A. Romero, M. Drozdzal, and A. Courville, A benchmark for endoluminal scene segmentation of colonoscopy images, J. Healthc. Eng., vol. 2017, pp. 1–9, 2017.

[15]

T. Rahim, M. A. Usman, and S. Y. Shin, A survey on contemporary computer-aided tumor, polyp, and ulcer detection methods in wireless capsule endoscopy imaging, Comput. Med. Imag. Graph., vol. 85, p. 101767, 2020.

[16]
K. He, X. Zhang, S. Ren, and J. Sun, Deep residual learning for image recognition, arXiv preprint arXiv: 1512.03385, 2015.
[17]
K. Simonyan and A. Zisserman, Very deep convolutional networks for large-scale image recognition, in Proc. 3rd International Conference on Learning Representations, San Diego, CA, USA, 2014.
[18]
X. Li, W. Wang, X. Hu, and J. Yang, Selective kernel networks, in Proc. 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Long Beach, CA, USA, 2019, pp. 510–519.
[19]
W. Wang, X. Li, J. Yang, and T. Lu, Mixed link networks, in Proc. 27th Int. Joint Conf. Artificial Intelligence, Stockholm, Sweden, 2018, pp. 2819–2825.
[20]
J. Long, E. Shelhamer, and T. Darrell, Fully convolutional networks for semantic segmentation, in Proc. 2015 IEEE Conf. Computer Vision and Pattern Recognition (CVPR), Boston, MA, USA, 2015, pp. 3431–3440.
[21]
L. Cai, M. Wu, L. Chen, W. Bai, M. Yang, S. Lyu, and Q. Zhao, Using guided self-attention with local information for polyp segmentation, in Proc. 25th Int. Conf. Medical Image Computing and Computer Assisted Intervention, Singapore, 2022, pp. 629–638.
[22]
N. K. Tomar, D. Jha, U. Bagci, and S. Ali, TGANet: Text-guided attention for improved polyp segmentation, in Proc. 25th Int. Conf. Medical Image Computing and Computer Assisted Intervention, Singapore, 2022, pp. 151–160.
[23]
R. Zhang, P. Lai, X. Wan, D. J. Fan, F. Gao, X. J. Wu, and G. Li, Lesion-aware dynamic kernel for polyp segmentation, in Proc. 25th Int. Conf. Medical Image Computing and Computer Assisted Intervention, Singapore, 2022, pp. 99–109.
[24]

J. H. Shi, Q. Zhang, Y. H. Tang, and Z. Q. Zhang, Polyp-mixer: An efficient context-aware MLP-based paradigm for polyp segmentation, IEEE Trans. Circuits Syst. Video Technol., vol. 33, no. 1, pp. 30–42, 2023.

[25]
X. Zhao, Z. Wu, S. Tan, D. J. Fan, Z. Li, X. Wan, and G. Li, Semi-supervised spatial temporal attention network for video polyp segmentation, in Proc. 25th Int. Conf. Medical Image Computing and Computer Assisted Intervention, Singapore, 2022, pp. 456–466.
[26]
M. Akbari, M. Mohrekesh, E. Nasr-Esfahani, S. M. Reza Soroushmehr, N. Karimi, S. Samavi, and K. Najarian, Polyp segmentation in colonoscopy images using fully convolutional network, in Proc. 2018 40th Annual Int. Conf. IEEE Engineering in Medicine and Biology Society (EMBC), Honolulu, HI, USA, 2018, pp. 69–72.
[27]

P. Brandao, O. Zisimopoulos, E. Mazomenos, G. Ciuti, J. Bernal, M. Visentini-Scarzanella, A. Menciassi, P. Dario, A. Koulaouzidis, A. Arezzo, et al., Towards a computed-aided diagnosis system in colonoscopy: Automatic polyp segmentation using convolution neural networks, J. Med. Robot. Res., vol. 3, no. 2, p. 1840002, 2018.

[28]
Z. Zhou, M. M. R. Siddiquee, N. Tajbakhsh, and J. Liang, UNet++: A nested U-net architecture for medical image segmentation, in Proc. 4th International Workshop, DLMIA 2018, and 8th International Workshop, ML-CDS 2018, held in conjunction with MICCAI 2018, Granada, Spain, 2018, pp. 3–11.
[29]
D. Jha, P. H. Smedsrud, M. A. Riegler, D. Johansen, T. De Lange, P. Halvorsen, and H. D Johansen, ResUNet: an advanced architecture for medical image segmentation, in Proc. 2019 IEEE Int. Symp. on Multimedia (ISM), San Diego, CA, USA, 2019, pp. 225–230.
[30]
X. Sun, P. Zhang, D. Wang, Y. Cao, and B. Liu, Colorectal polyp segmentation by U-net with dilation convolution, in Proc. 2019 18th IEEE Int. Conf. Machine Learning and Applications (ICMLA), Boca Raton, FL, USA, 2020, pp. 851–858.
[31]
B. Murugesan, K. Sarveswaran, S. M. Shankaranarayana, K. Ram, J. Joseph, and M. Sivaprakasam, Psi-Net: Shape and boundary aware joint multi-task deep network for medical image segmentation, in Proc. 2019 41st Annual Int. Conf. IEEE Engineering in Medicine and Biology Society (EMBC), Berlin, Germany, 2019, pp. 7223–7226.
[32]
H. Ali Qadir, Y. Shin, J. Solhusvik, J. Bergsland, L. Aabakken, and I. Balasingham, Polyp detection and segmentation using mask R-CNN: Does a deeper feature extractor CNN always perform better? in Proc. 2019 13th Int. Symp. on Medical Information and Communication Technology (ISMICT), Oslo, Norway, 2019, pp. 1–6.
[33]
K. He, G. Gkioxari, P. Dollár, and R. Girshick, Mask R-CNN, in Proc. 2017 IEEE International Conference on Computer Vision (ICCV), Venice, Italy, 2017, pp. 2980–2988.
[34]
S. Alam, N. K. Tomar, A. Thakur, D. Jha, and A. Rauniyar, Automatic polyp segmentation using U-net-ResNet50, in Proc. MediaEval 2020 Workshop, virtual, 2020.
[35]

D. Banik, K. Roy, D. Bhattacharjee, M. Nasipuri, and O. Krejcar, Polyp-net: A multimodel fusion network for polyp segmentation, IEEE Trans. Instrum. Meas., vol. 70, pp. 1–12, 2021.

[36]

T. Rahim, S. Ali Hassan, and S. Y. Shin, A deep convolutional neural network for the detection of polyps in colonoscopy images, Biomed. Signal Process. Contr., vol. 68, p. 102654, 2021.

[37]

D. Jha, S. Ali, N. K. Tomar, H. D. Johansen, D. Johansen, J. Rittscher, M. A. Riegler, and P. Halvorsen, Real-time polyp detection, localization and segmentation in colonoscopy using deep learning, IEEE Access, vol. 9, pp. 40496–40510, 2021.

[38]
A. M. A. Ahmed, Generative adversarial networks for automatic polyp segmentation, in Proc. MediaEval 2020 Workshop, virtual, 2020.
[39]
V. Thambawita, S. Hicks, P. Halvorsen, and M. A. Riegler, Pyramid-focus-augmentation: Medical image segmentation with step-wise focus, in Proc. MediaEval 2020 Workshop, virtual, 2020.
[40]
N. K. Tomar, D. Jha, S. Ali, H. D. Johansen, D. Johansen, M. A. Riegler, and P. Halvorsen, DDANet: dual decoder attention network for automatic polyp segmentation, in Proc. 2021 Int. Conf. Pattern Recognition, virtual, 2021, 307–314.
[41]
C. H. Huang, H. Y. Wu, and Y. L. Lin, HarDNet-MSEG: A simple encoder-decoder polyp segmentation neural network that achieves over 0.9 mean dice and 86 FPS, arXiv preprint arXiv: 2101.07172, 2021.
[42]
P. Chao, C. Y. Kao, Y. Ruan, C. H. Huang, and Y. L. Lin, HarDNet: A low memory traffic network, in Proc. 2019 IEEE/CVF International Conference on Computer Vision (ICCV), Seoul, Republic of Korea, 2019, pp. 3551–3560.
[43]
Y. Zhang, H. Liu, and Q. Hu, Transfuse: Fusing transformers and CNNs for medical image segmentation, in Proc. 24th Int. Conf. Medical Image Computing and Computer Assisted Intervention, Strasbourg, France, 2021, pp. 14–24.
[44]
Z. Yin, K. Liang, Z. Ma, and J. Guo, Duplex contextual relation network for polyp segmentation, in Proc. 2022 IEEE 19th Int. Symp. on Biomedical Imaging (ISBI), Kolkata, India, 2022, pp. 1–5.
[45]
X. Zhao, L. Zhang, and H. Lu, Automatic polyp segmentation via multi-scale subtraction network, in Proc. 24th Int. Conf. Medical Image Computing and Computer Assisted Intervention, Strasbourg, France, 2021, pp. 120–130.
[46]
Z. Zhou, J. Shin, L. Zhang, S. Gurudu, M. Gotway, and J. Liang, Fine-tuning convolutional neural networks for biomedical image analysis: Actively and incrementally, in Proc. 2017 IEEE Conf. Computer Vision and Pattern Recognition (CVPR), Honolulu, HI, USA, 2017, pp. 4761–4772.
[47]

N. Tajbakhsh, J. Y. Shin, S. R. Gurudu, R. T. Hurst, C. B. Kendall, M. B. Gotway, and J. Liang, Convolutional neural networks for medical image analysis: Fulltraining or fine tuning, IEEE Trans. Med. Imaging, vol. 35, no. 5, pp. 1299–1312, 2016.

[48]
X. Xie, J. Chen, Y. Li, L. Shen, K. Ma, and Y. Zheng, MI2GAN: Generative adversarial network for medical image domain adaptation using mutual information constraint, in Proc 23th Int. Conf. Medical Image Computing and Computer Assisted Intervention, Lima, Peru, 2020, pp. 516–525.
[49]
R. Zhang, G. Li, Z. Li, S. Cui, D. Qian, and Y. Yu, Adaptive context selection for polyp segmentation, in Proc 23th Int. Conf. Medical Image Computing and Computer Assisted Intervention, Lima, Peru, 2020, pp. 253–262.
[50]
N. K. Tomar, Automatic polyp segmentation using fully convolutional neural network, in Proc. MediaEval 2020 Workshop, virtual, 2020.
[51]
D. Jha, S. Hicks, K. Emanuelsen, H. D. Johansen, D. Johansen, T. Lange, M. Riegler, and P. Halvorsen, Medico multimedia task at MediaEval 2020: Automatic polyp segmentation, in Proc. MediaEval 2020 Workshop, virtual, 2020.
[52]
K. Patel, A. M. Bur, and G. Wang, Enhanced U-net: A feature enhancement network for polyp segmentation, in Proc. 2021 18th Conf. Robots and Vision (CRV), Burnaby, Canada, 2021, pp. 181–188.
[53]
A. Lumini, L. Nanni, and G. Maguolo, Deep ensembles based on stochastic activation selection for polyp segmentation, in Proc. 2021 Medical Imaging with Deep Learning, Lübeck, Germany, 2021.
[54]
M. V. L. Branch and A. S. Carvalho, Polyp segmentation in colonoscopy images using U-net-MobileNetV2, arXiv preprint arXiv: 2103.15715, 2021.
[55]
R. Khadga, D. Jha, S. Ali, S. Hicks, V. Thambawita, M. A. Riegler, and P. Halvorsen, Meta-learning with implicit gradients in a few-shot setting for medical image segmentation, arXiv preprint arXiv: 2106.03223, 2021.
[56]
D. V. Sang, T. Q. Chung, P. N. Lan, D. V. Hang, D. V. Long, and N. T. Thuy, Ag-CUResNeSt: A novel method for colon polyp segmentation, arXiv preprint arXiv: 2105.00402, 2021.
[57]

C. Yang, X. Guo, M. Zhu, B. Ibragimov, and Y. Yuan, Mutual-prototype adaptation for cross-domain polyp segmentation, IEEE J. Biomed. Health Inform., vol. 25, no. 10, pp. 3886–3897, 2021.

[58]

D. Jha, P. H. Smedsrud, D. Johansen, T. de Lange, H. D. Johansen, P. Halvorsen, and M. A. Riegler, A comprehensive study on colorectal polyp segmentation with ResUNet++, conditional random field and test-time augmentation, IEEE J. Biomed. Health Inform., vol. 25, no. 6, pp. 2029–2040, 2021.

[59]
D. Jha, N. K. Tomar, S. Ali, M. A. Riegler, H. D. Johansen, D. Johansen, T. de Lange, and P. Halvorsen, NanoNet: real-time polyp segmentation in video capsule endoscopy and colonoscopy, in Proc. 2021 IEEE 34th Int. Symp. on Computer-Based Medical Systems (CBMS), Aveiro, Portugal, 2021, pp. 37–43.
[60]
S. Li, X. Sui, X. Luo, X. Xu, Y. Liu, and R. Goh, Medical image segmentation using squeeze-and-expansion transformers, in Proc. 30th Int. Joint Conf. Artificial Intelligence, virtual, 2021, pp. 807–815.
[61]
T. Kim, H. Lee, and D. Kim, UACANet: Uncertainty augmented context attention for polyp semgnetaion, in Proc. 29th ACM Int. Conf. Multimedia, virtual, 2021, pp. 2167–2175.
[62]
V. L. Thambawita, S. Hicks, P. Halvorsen, and M. Riegler, DivergentNets: Medical image segmentation by network ensemble, in Proc. 3rd Int. Workshop and Challenge on Computer Vision in Endoscopy (EndoCV2021) in conjunction with the 18th IEEE Int. Symp. Biomedical Imaging (ISBI2021), Nice, France, 2021, pp. 27–38.
[63]

X. Guo, C. Yang, and Y. Yuan, Dynamic-weighting hierarchical segmentation network for medical images, Med. Image Anal., vol. 73, p. 102196, 2021.

[64]
G. P. Ji, Y. C. Chou, D. P. Fan, G. Chen, H. Fu, D. Jha, and L. Shao, Progressively normalized self-attention network for video polyp segmentation, in Proc. 24th Int. Conf. Medical Image Computing and Computer Assisted Intervention, Strasbourg, France, 2021, pp. 142–152.
[65]
A. Vaswani, N. Shazeer, N. Parmar, J. Uszkoreit, L. Jones, A. N. Gomez, L. Kaiser, and I. Polosukhin, Attention is all you need, in Proc. 31st Conf. Neural Information Processing Systems, Long Beach, CA, USA, 2017, pp. 5998–6008.
[66]
A. Dosovitskiy, L. Beyer, A. Kolesnikov, D. Weissenborn, X. Zhai, T. Unterthiner, M. Dehghani, M. Minderer, G. Heigold, S. Gelly, et al., An image is worth 16×16 words: Transformers for image recognition at scale, in Proc. 9th Int. Conf. Learning Representations, Vienna, Austria, 2021.
[67]
Z. Pan, B. Zhuang, J. Liu, H. He, and J. Cai, Scalable vision transformers with hierarchical pooling, in Proc. 2021 IEEE/CVF International Conference on Computer Vision (ICCV), Montreal, Canada, 2021, pp. 367–376.
[68]
B. Heo, S. Yun, D. Han, S. Chun, J. Choe, and S. J. Oh, Rethinking spatial dimensions of vision transformers, in Proc. 2021 IEEE/CVF International Conference on Computer Vision (ICCV), Montreal, Canada, 2021, pp. 11916–11925.
[69]
L. Yuan, Y. Chen, T. Wang, W. Yu, Y. Shi, Z. Jiang, F. E. H. Tay, J. Feng, and S. Yan, Tokens-to-token ViT: Training vision transformers from scratch on ImageNet, in Proc. 2021 IEEE/CVF Int. Conf. Computer Vision (ICCV), Montreal, Canada, 2021, pp. 538–547.
[70]
K. Han, A. Xiao, E. Wu, J. Guo, C. Xu, and Y. Wang, Transformer in transformer, in Proc. 35th Conf. Neural Information Processing Systems, virtual, 2021, pp. 15908–15919.
[71]
W. Wang, E. Xie, X. Li, D. P. Fan, K. Song, D. Liang, T. Lu, P. Luo, and L. Shao, Pyramid vision transformer: A versatile backbone for dense prediction without convolutions, in Proc. 2021 IEEE/CVF Int. Conf. Computer Vision (ICCV), Montreal, Canada, 2021, pp. 548–558.
[72]

W. Wang, E. Xie, X. Li, D. P. Fan, K. Song, D. Liang, T. Lu, P. Luo, and L. Shao, PVT v2: Improved baselines with pyramid vision transformer, Computational Visual Media, vol. 8, no. 3, pp. 415–424, 2022.

[73]
Z. Liu, Y. Lin, Y. Cao, H. Hu, Y. Wei, Z. Zhang, S. Lin, and B. Guo, Swin transformer: Hierarchical vision transformer using shifted windows, in Proc. 2021 IEEE/CVF International Conference on Computer Vision (ICCV), Montreal, Canada, 2021, pp. 9992–10002.
[74]
H. Wu, B. Xiao, N. Codella, M. Liu, X. Dai, L. Yuan, and L. Zhang, CvT: introducing convolutions to vision transformers, in Proc. 2021 IEEE/CVF Int. Conf. Computer Vision (ICCV), Montreal, Canada, 2021, pp. 22–31.
[75]
W. Xu, Y. Xu, T. Chang, and Z. Tu, Co-scale conv-attentional image transformers, in Proc. 2021 IEEE/CVF International Conference on Computer Vision (ICCV), Montreal, Canada, 2021, pp. 9961–9970.
[76]
X. Chu, Z. Tian, Y. Wang, B. Zhang, H. Ren, X. Wei, H. Xia, and C. Shen, Twins: Revisiting the design of spatial attention in vision transformers, in Proc. 34th Conf. Neural Information Processing Systems, virtual, 2021, pp. 9355–9366.
[77]
B. Graham, A. El-Nouby, H. Touvron, P. Stock, A. Joulin, H. Jegou, and M. Douze, LeViT: A vision transformer in ConvNet’s clothing for faster inference, in Proc. 2021 IEEE/CVF Int. Conf. Computer Vision (ICCV). Montreal, Canada, 2021, 12239–12249.
[78]
S. Bhojanapalli, A. Chakrabarti, D. Glasner, D. Li, T. Unterthiner, and A. Veit, Understanding robustness of transformers for image classification, in Proc. 2021 IEEE/CVF Int. Conf. Computer Vision (ICCV), Montreal, Canada, 2021, 10211–10221.
[79]
E. Xie, W. Wang, Z. Yu, A. Anandkumar, J. M. Alvarez, and P. Luo, SegFormer: Simple and efficient design for semantic segmentation with transformers, in Proc. 34th Conf. Neural Information Processing Systems, virtual, 2021, pp. 12077–12090.
[80]
Z. Wu, L. Su, and Q. Huang, Cascaded partial decoder for fast and accurate salient object detection, in Proc. 2019 IEEE/CVF Conf. Computer Vision and Pattern Recognition (CVPR), Long Beach, CA, USA, 2020, pp. 3902–3911.
[81]
S. Ioffe and C. Szegedy, Batch normalization: Accelerating deep network training by reducing internal covariate shift, in Proc. 32nd Int. Conf. Machine Learning (ICML), Lille, France, 2015, pp. 448–456.
[82]
X. Glorot, A. Bordes, and Y. Bengio, Deep sparse rectifier neural networks, in Proc. 14th Int. Conf. Artificial Intelligence and Statistics, Fort Lauderdale, FL, USA, 2011, pp. 315–323.
[83]
S. Woo, J. Park, J. Y. Lee, and I. S. Kweon, CBAM: Convolutional block attention module, in Proc. 15th European Conf. Computer Vision, Munich, Germany, 2018, pp. 3–19.
[84]
J. Hu, L. Shen, and G. Sun, Squeeze-and-excitation networks, in Proc. 2018 IEEE Conf. Computer Vision and Pattern Recognition (CVPR), Salt Lake City, UT, USA, 2018, pp. 7132–7141.
[85]
X. Wang, R. B. Girshick, A. Gupta, and K. He, Non-local neural networks, in Proc. 2018 IEEE Conf. Computer Vision and Pattern Recognition (CVPR), Salt Lake City, UT, USA, 2018, pp. 7794–7803.
[86]
G. Te, Y. Liu, W. Hu, H. Shi, and T. Mei, Edge-aware graph representation learning and reasoning for face parsing, in Proc. 16th European Conf. Computer Vision, Glasgow, UK, 2020, pp. 258–274.
[87]
Y. Lu, Y. Chen, D. Zhao, and J. Chen, Graph-FCN for image semantic segmentation, in Proc. 16th Int. Symp. Neural Networks, Moscow, Russia, 2019, pp. 97–105.
[88]
J. Wei, S. Wang, and Q. Huang, F³Net: Fusion, feedback and focus for salient object detection, in Proc. 34th AAAI Conf. Artificial Intelligence (2020), 32nd Innovative Applications of Artificial Intelligence Conf. (IAAI), 10th AAAI Symp. Educational Advances in Artificial Intelligence (EAAI), New York, NY, USA, 2020, pp. 12321–12328.
[89]
I. Loshchilov and F. Hutter, Decoupled weight decay regularization, in Proc. 7th Int. Conf. Learning Representations (ICLR), New Orleans, LA, USA, 2017.
[90]
F. Milletari, N. Navab, and S. A. Ahmadi, V-net: Fully convolutional neural networks for volumetric medical image segmentation, in Proc. 2016 Fourth Int. Conf. 3D Vision (3DV), Stanford, CA, USA, 2016, pp. 565–571.
[91]
R. Margolin, L. Zelnik-Manor, and A. Tal, How to evaluate foreground maps, in Proc. 2014 IEEE Conf. Computer Vision and Pattern Recognition, Columbus, OH, USA, 2014, pp. 248–255.
[92]

M. M. Cheng and D. P. Fan, Structure-measure: A new way to evaluate foreground maps, Int. J. Comput. Vis., vol. 129, no. 9, pp. 2622–2638, 2021.

[93]
D. P. Fan, G. P. Ji, X. B. Qin, M. M. Cheng, Cognitive vision inspired object segmentation metric and loss function, (in Chinese), SCIENTIA SINICA Informat., vol. 51, no. 9, pp. 1475–1489, 2021.
[94]
D. P. Fan, C. Gong, Y. Cao, B. Ren, M. M. Cheng, and A. Borji, Enhanced-alignment measure for binary foreground map evaluation, in Proc. 27th Int. Joint Conf. Artificial Intelligence, Stockholm, Sweden, 2018, pp. 698–704.
[95]
Y. Fang, C. Chen, Y. Yuan, and K. -Y. Tong, Selective feature aggregation network with area-boundary constraints for polyp segmentation, in Proc. 22nd Int. Conf. Medical Image Computing and Computer Assisted Intervention, Shenzhen, China, 2022, pp. 302–310.
[96]

G. P. Ji, G. Xiao, Y. C. Chou, D. P. Fan, K. Zhao, G. Chen, and L. Van Gool, Video polyp segmentation: A deep learning perspective, Mach. Intell. Res., vol. 19, no. 6, pp. 531–549, 2022.

[97]

J. Bernal, J. Sánchez, and F. Vilariño, Towards automatic polyp detection with a polyp appearance model, Pattern Recognit., vol. 45, no. 9, pp. 3166–3182, 2012.

CAAI Artificial Intelligence Research
Article number: 9150015
Cite this article:
Dong B, Wang W, Fan D-P, et al. Polyp-PVT: Polyp Segmentation with Pyramid Vision Transformers. CAAI Artificial Intelligence Research, 2023, 2: 9150015. https://doi.org/10.26599/AIR.2023.9150015

3650

Views

1109

Downloads

95

Crossref

Altmetrics

Received: 06 December 2022
Revised: 10 February 2023
Accepted: 22 March 2023
Published: 30 June 2023
© The author(s) 2023.

The articles published in this open access journal are distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/).

Return