| Sign up

PDF (2.6 MB)

Cite

EndNote(RIS) BibTeX

Collect

Collect

Submit Manuscript

Article | Open Access

Polyp-PVT: Polyp Segmentation with Pyramid Vision Transformers

Bo Dong^¹, Wenhai Wang^², Deng-Ping Fan^¹

(), Jinpeng Li^³, Huazhu Fu^⁴, Ling Shao^⁵

1College of Computer Science, Nankai University, Tianjin 300350, China

2Shanghai Artificial Intelligence Laboratory, Shanghai 200232, China

3Computer Vision Lab, Inception Institute of Artificial Intelligence, Abu Dhabi, United Arab Emirates

4Institute of High Performance Computing, Agency for Science, Technology and Research, Singapore 138632, Singapore

5UCAS-Terminus AI Lab, Terminus Group, Chongqing 400042, China

Show Author Information

Abstract

Most polyp segmentation methods use convolutional neural networks (CNNs) as their backbone, leading to two key issues when exchanging information between the encoder and decoder: (1) taking into account the differences in contribution between different-level features, and (2) designing an effective mechanism for fusing these features. Unlike existing CNN-based methods, we adopt a transformer encoder, which learns more powerful and robust representations. In addition, considering the image acquisition influence and elusive properties of polyps, we introduce three standard modules, including a cascaded fusion module (CFM), a camouflage identification module (CIM), and a similarity aggregation module (SAM). Among these, the CFM is used to collect the semantic and location information of polyps from high-level features; the CIM is applied to capture polyp information disguised in low-level features, and the SAM extends the pixel features of the polyp area with high-level semantic position information to the entire polyp area, thereby effectively fusing cross-level features. The proposed model, named Polyp-PVT, effectively suppresses noises in the features and significantly improves their expressive capabilities. Extensive experiments on five widely adopted datasets show that the proposed model is more robust to various challenging situations (e.g., appearance changes, small objects, and rotation) than existing representative methods. The proposed model is available at https://github.com/DengPingFan/Polyp-PVT.

Keywords

polyp segmentation pyramid vision transformer colonoscopy computer vision

References

[1]

M. Fiori, P. Musé, and G. Sapiro, A complete system for candidate polyps detection in virtual colonoscopy, Int. J. Patt. Recogn. Artif. Intell., vol. 28, no. 7, p. 1460014, 2014.

Crossref Google Scholar

[2]

A. V. Mamonov, I. N. Figueiredo, P. N. Figueiredo, and Y. H. Richard Tsai, Automated polyp detection in colon capsule endoscopy, IEEE Trans. Med. Imag., vol. 33, no. 7, pp. 1488–1502, 2014.

Crossref Google Scholar

[3]

O. H. Maghsoudi. Superpixel based segmentation andclassification of polyps in wireless capsule endoscopy, in Proc. 2017 IEEE Signal Processing in Medicine and Biology Symposium (SPMB), Philadelphia, PA, USA, 2017, pp. 1–4.

[4]

O. Ronneberger, P. Fischer, and T. Brox, U-net: Convolutional networks for biomedical image segmentation, in Proc. 18th Int. Conf. Medical Image Computing and Computer Assisted Intervention, Munich, Germany, 2015, pp. 234–241.

[5]

D. P. Fan, G. P. Ji, T. Zhou, G. Chen, H. Fu, J. Shen, and L. Shao, PraNet: parallel reverse attention network for polyp segmentation, in Proc 23th Int. Conf. Medical Image Computing and Computer Assisted Intervention, Lima, Peru, 2020, pp. 263–273.

[6]

X. Guo, C. Yang, Y. Liu, and Y. Yuan, Learn to threshold: ThresholdNet with confidence-guided manifold mixup for polyp segmentation, IEEE Trans. Med. Imag., vol. 40, no. 4, pp. 1134–1146, 2021.

Crossref Google Scholar

[7]

J. Wei, Y. Hu, R. Zhang, Z. Li, S. K. Zhou, and S. Cui, Shallow attention network for polyp segmentation, arXiv preprint arXiv: 2108.00882, 2021.

[8]

J. Bernal, F. J. Sánchez, G. Fernández-Esparrach, D. Gil, C. Rodríguez, and F. Vilariño, WM-DOVA maps for accurate polyp highlighting in colonoscopy: Validation vs. saliency maps from physicians, Comput. Med. Imag. Graph., vol. 43, pp. 99–111, 2015.

Crossref Google Scholar

[9]

J. Silva, A. Histace, O. Romain, X. Dray, and B. Granado, Toward embedded detection of polyps in WCE images for early diagnosis of colorectal cancer, Int. J. Comput. Assist. Radiol. Surg., vol. 9, no. 2, pp. 283–293, 2014.

Crossref Google Scholar

[10]

N. Tajbakhsh, S. R. Gurudu, and J. Liang, Automated polyp detection in colonoscopy videos using shape and context information, IEEE Trans. Med. Imag., vol. 35, no. 2, pp. 630–644, 2016.

Crossref Google Scholar

[11]

D. P. Fan, G. P. Ji, M. M. Cheng, and L. Shao, Concealed object detection, IEEE Trans. Pattern Anal. Mach. Intell, vol. 44, no. 10, p. 6024, 6042.

Crossref Google Scholar

[12]

D. P. Fan, G. P. Ji, M. M. Cheng, and L. Shao, Concealed object detection, in Proc. 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Seattle, WA, USA, 2020, pp. 2774–2784.

[13]

D. Jha, P. H. Smedsrud, M. A. Riegler, P. Halvorsen, T. de Lange, D. Johansen, and H. D. Johansen, Kvasir-SEG: A segmented polyp dataset, in Proc. 26th Int. Conf. Multimedia Modeling, Daejeon, Korea, 2020, pp. 451–462.

[14]

D. Vázquez, J. Bernal, F. J. Sánchez, G. Fernández-Esparrach, A. M. López, A. Romero, M. Drozdzal, and A. Courville, A benchmark for endoluminal scene segmentation of colonoscopy images, J. Healthc. Eng., vol. 2017, pp. 1–9, 2017.

Crossref Google Scholar

[15]

T. Rahim, M. A. Usman, and S. Y. Shin, A survey on contemporary computer-aided tumor, polyp, and ulcer detection methods in wireless capsule endoscopy imaging, Comput. Med. Imag. Graph., vol. 85, p. 101767, 2020.

Crossref Google Scholar

[16]

K. He, X. Zhang, S. Ren, and J. Sun, Deep residual learning for image recognition, arXiv preprint arXiv: 1512.03385, 2015.

[17]

K. Simonyan and A. Zisserman, Very deep convolutional networks for large-scale image recognition, in Proc. 3rd International Conference on Learning Representations, San Diego, CA, USA, 2014.

[18]

X. Li, W. Wang, X. Hu, and J. Yang, Selective kernel networks, in Proc. 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Long Beach, CA, USA, 2019, pp. 510–519.

[19]

W. Wang, X. Li, J. Yang, and T. Lu, Mixed link networks, in Proc. 27th Int. Joint Conf. Artificial Intelligence, Stockholm, Sweden, 2018, pp. 2819–2825.

[20]

J. Long, E. Shelhamer, and T. Darrell, Fully convolutional networks for semantic segmentation, in Proc. 2015 IEEE Conf. Computer Vision and Pattern Recognition (CVPR), Boston, MA, USA, 2015, pp. 3431–3440.

[21]

L. Cai, M. Wu, L. Chen, W. Bai, M. Yang, S. Lyu, and Q. Zhao, Using guided self-attention with local information for polyp segmentation, in Proc. 25th Int. Conf. Medical Image Computing and Computer Assisted Intervention, Singapore, 2022, pp. 629–638.

[22]

N. K. Tomar, D. Jha, U. Bagci, and S. Ali, TGANet: Text-guided attention for improved polyp segmentation, in Proc. 25th Int. Conf. Medical Image Computing and Computer Assisted Intervention, Singapore, 2022, pp. 151–160.

[23]

R. Zhang, P. Lai, X. Wan, D. J. Fan, F. Gao, X. J. Wu, and G. Li, Lesion-aware dynamic kernel for polyp segmentation, in Proc. 25th Int. Conf. Medical Image Computing and Computer Assisted Intervention, Singapore, 2022, pp. 99–109.

[24]

J. H. Shi, Q. Zhang, Y. H. Tang, and Z. Q. Zhang, Polyp-mixer: An efficient context-aware MLP-based paradigm for polyp segmentation, IEEE Trans. Circuits Syst. Video Technol., vol. 33, no. 1, pp. 30–42, 2023.

Crossref Google Scholar

[25]

X. Zhao, Z. Wu, S. Tan, D. J. Fan, Z. Li, X. Wan, and G. Li, Semi-supervised spatial temporal attention network for video polyp segmentation, in Proc. 25th Int. Conf. Medical Image Computing and Computer Assisted Intervention, Singapore, 2022, pp. 456–466.

[26]

M. Akbari, M. Mohrekesh, E. Nasr-Esfahani, S. M. Reza Soroushmehr, N. Karimi, S. Samavi, and K. Najarian, Polyp segmentation in colonoscopy images using fully convolutional network, in Proc. 2018 40th Annual Int. Conf. IEEE Engineering in Medicine and Biology Society (EMBC), Honolulu, HI, USA, 2018, pp. 69–72.

[27]

P. Brandao, O. Zisimopoulos, E. Mazomenos, G. Ciuti, J. Bernal, M. Visentini-Scarzanella, A. Menciassi, P. Dario, A. Koulaouzidis, A. Arezzo, et al., Towards a computed-aided diagnosis system in colonoscopy: Automatic polyp segmentation using convolution neural networks, J. Med. Robot. Res., vol. 3, no. 2, p. 1840002, 2018.

Crossref Google Scholar

[28]

Z. Zhou, M. M. R. Siddiquee, N. Tajbakhsh, and J. Liang, UNet++: A nested U-net architecture for medical image segmentation, in Proc. 4th International Workshop, DLMIA 2018, and 8th International Workshop, ML-CDS 2018, held in conjunction with MICCAI 2018, Granada, Spain, 2018, pp. 3–11.

[29]

D. Jha, P. H. Smedsrud, M. A. Riegler, D. Johansen, T. De Lange, P. Halvorsen, and H. D Johansen, ResUNet: an advanced architecture for medical image segmentation, in Proc. 2019 IEEE Int. Symp. on Multimedia (ISM), San Diego, CA, USA, 2019, pp. 225–230.

[30]

X. Sun, P. Zhang, D. Wang, Y. Cao, and B. Liu, Colorectal polyp segmentation by U-net with dilation convolution, in Proc. 2019 18th IEEE Int. Conf. Machine Learning and Applications (ICMLA), Boca Raton, FL, USA, 2020, pp. 851–858.

[31]

B. Murugesan, K. Sarveswaran, S. M. Shankaranarayana, K. Ram, J. Joseph, and M. Sivaprakasam, Psi-Net: Shape and boundary aware joint multi-task deep network for medical image segmentation, in Proc. 2019 41st Annual Int. Conf. IEEE Engineering in Medicine and Biology Society (EMBC), Berlin, Germany, 2019, pp. 7223–7226.

[32]

H. Ali Qadir, Y. Shin, J. Solhusvik, J. Bergsland, L. Aabakken, and I. Balasingham, Polyp detection and segmentation using mask R-CNN: Does a deeper feature extractor CNN always perform better? in Proc. 2019 13th Int. Symp. on Medical Information and Communication Technology (ISMICT), Oslo, Norway, 2019, pp. 1–6.

[33]

K. He, G. Gkioxari, P. Dollár, and R. Girshick, Mask R-CNN, in Proc. 2017 IEEE International Conference on Computer Vision (ICCV), Venice, Italy, 2017, pp. 2980–2988.

[34]

S. Alam, N. K. Tomar, A. Thakur, D. Jha, and A. Rauniyar, Automatic polyp segmentation using U-net-ResNet50, in Proc. MediaEval 2020 Workshop, virtual, 2020.

[35]

D. Banik, K. Roy, D. Bhattacharjee, M. Nasipuri, and O. Krejcar, Polyp-net: A multimodel fusion network for polyp segmentation, IEEE Trans. Instrum. Meas., vol. 70, pp. 1–12, 2021.

Crossref Google Scholar

[36]

T. Rahim, S. Ali Hassan, and S. Y. Shin, A deep convolutional neural network for the detection of polyps in colonoscopy images, Biomed. Signal Process. Contr., vol. 68, p. 102654, 2021.

Crossref Google Scholar

[37]

D. Jha, S. Ali, N. K. Tomar, H. D. Johansen, D. Johansen, J. Rittscher, M. A. Riegler, and P. Halvorsen, Real-time polyp detection, localization and segmentation in colonoscopy using deep learning, IEEE Access, vol. 9, pp. 40496–40510, 2021.

Crossref Google Scholar

[38]

A. M. A. Ahmed, Generative adversarial networks for automatic polyp segmentation, in Proc. MediaEval 2020 Workshop, virtual, 2020.

[39]

V. Thambawita, S. Hicks, P. Halvorsen, and M. A. Riegler, Pyramid-focus-augmentation: Medical image segmentation with step-wise focus, in Proc. MediaEval 2020 Workshop, virtual, 2020.

[40]

N. K. Tomar, D. Jha, S. Ali, H. D. Johansen, D. Johansen, M. A. Riegler, and P. Halvorsen, DDANet: dual decoder attention network for automatic polyp segmentation, in Proc. 2021 Int. Conf. Pattern Recognition, virtual, 2021, 307–314.

[41]

C. H. Huang, H. Y. Wu, and Y. L. Lin, HarDNet-MSEG: A simple encoder-decoder polyp segmentation neural network that achieves over 0.9 mean dice and 86 FPS, arXiv preprint arXiv: 2101.07172, 2021.

[42]

P. Chao, C. Y. Kao, Y. Ruan, C. H. Huang, and Y. L. Lin, HarDNet: A low memory traffic network, in Proc. 2019 IEEE/CVF International Conference on Computer Vision (ICCV), Seoul, Republic of Korea, 2019, pp. 3551–3560.

[43]

Y. Zhang, H. Liu, and Q. Hu, Transfuse: Fusing transformers and CNNs for medical image segmentation, in Proc. 24th Int. Conf. Medical Image Computing and Computer Assisted Intervention, Strasbourg, France, 2021, pp. 14–24.

[44]

Z. Yin, K. Liang, Z. Ma, and J. Guo, Duplex contextual relation network for polyp segmentation, in Proc. 2022 IEEE 19th Int. Symp. on Biomedical Imaging (ISBI), Kolkata, India, 2022, pp. 1–5.

[45]

X. Zhao, L. Zhang, and H. Lu, Automatic polyp segmentation via multi-scale subtraction network, in Proc. 24th Int. Conf. Medical Image Computing and Computer Assisted Intervention, Strasbourg, France, 2021, pp. 120–130.

[46]

Z. Zhou, J. Shin, L. Zhang, S. Gurudu, M. Gotway, and J. Liang, Fine-tuning convolutional neural networks for biomedical image analysis: Actively and incrementally, in Proc. 2017 IEEE Conf. Computer Vision and Pattern Recognition (CVPR), Honolulu, HI, USA, 2017, pp. 4761–4772.

[47]

N. Tajbakhsh, J. Y. Shin, S. R. Gurudu, R. T. Hurst, C. B. Kendall, M. B. Gotway, and J. Liang, Convolutional neural networks for medical image analysis: Fulltraining or fine tuning, IEEE Trans. Med. Imaging, vol. 35, no. 5, pp. 1299–1312, 2016.

Crossref Google Scholar

[48]

X. Xie, J. Chen, Y. Li, L. Shen, K. Ma, and Y. Zheng, MI²GAN: Generative adversarial network for medical image domain adaptation using mutual information constraint, in Proc 23th Int. Conf. Medical Image Computing and Computer Assisted Intervention, Lima, Peru, 2020, pp. 516–525.

[49]

R. Zhang, G. Li, Z. Li, S. Cui, D. Qian, and Y. Yu, Adaptive context selection for polyp segmentation, in Proc 23th Int. Conf. Medical Image Computing and Computer Assisted Intervention, Lima, Peru, 2020, pp. 253–262.

[50]

N. K. Tomar, Automatic polyp segmentation using fully convolutional neural network, in Proc. MediaEval 2020 Workshop, virtual, 2020.

[51]

D. Jha, S. Hicks, K. Emanuelsen, H. D. Johansen, D. Johansen, T. Lange, M. Riegler, and P. Halvorsen, Medico multimedia task at MediaEval 2020: Automatic polyp segmentation, in Proc. MediaEval 2020 Workshop, virtual, 2020.

[52]

K. Patel, A. M. Bur, and G. Wang, Enhanced U-net: A feature enhancement network for polyp segmentation, in Proc. 2021 18th Conf. Robots and Vision (CRV), Burnaby, Canada, 2021, pp. 181–188.

[53]

A. Lumini, L. Nanni, and G. Maguolo, Deep ensembles based on stochastic activation selection for polyp segmentation, in Proc. 2021 Medical Imaging with Deep Learning, Lübeck, Germany, 2021.

[54]

M. V. L. Branch and A. S. Carvalho, Polyp segmentation in colonoscopy images using U-net-MobileNetV2, arXiv preprint arXiv: 2103.15715, 2021.

[55]

R. Khadga, D. Jha, S. Ali, S. Hicks, V. Thambawita, M. A. Riegler, and P. Halvorsen, Meta-learning with implicit gradients in a few-shot setting for medical image segmentation, arXiv preprint arXiv: 2106.03223, 2021.

[56]

D. V. Sang, T. Q. Chung, P. N. Lan, D. V. Hang, D. V. Long, and N. T. Thuy, Ag-CUResNeSt: A novel method for colon polyp segmentation, arXiv preprint arXiv: 2105.00402, 2021.

[57]

C. Yang, X. Guo, M. Zhu, B. Ibragimov, and Y. Yuan, Mutual-prototype adaptation for cross-domain polyp segmentation, IEEE J. Biomed. Health Inform., vol. 25, no. 10, pp. 3886–3897, 2021.

Crossref Google Scholar

[58]

D. Jha, P. H. Smedsrud, D. Johansen, T. de Lange, H. D. Johansen, P. Halvorsen, and M. A. Riegler, A comprehensive study on colorectal polyp segmentation with ResUNet++, conditional random field and test-time augmentation, IEEE J. Biomed. Health Inform., vol. 25, no. 6, pp. 2029–2040, 2021.

Crossref Google Scholar

[59]

D. Jha, N. K. Tomar, S. Ali, M. A. Riegler, H. D. Johansen, D. Johansen, T. de Lange, and P. Halvorsen, NanoNet: real-time polyp segmentation in video capsule endoscopy and colonoscopy, in Proc. 2021 IEEE 34th Int. Symp. on Computer-Based Medical Systems (CBMS), Aveiro, Portugal, 2021, pp. 37–43.

[60]

S. Li, X. Sui, X. Luo, X. Xu, Y. Liu, and R. Goh, Medical image segmentation using squeeze-and-expansion transformers, in Proc. 30th Int. Joint Conf. Artificial Intelligence, virtual, 2021, pp. 807–815.

[61]

T. Kim, H. Lee, and D. Kim, UACANet: Uncertainty augmented context attention for polyp semgnetaion, in Proc. 29th ACM Int. Conf. Multimedia, virtual, 2021, pp. 2167–2175.

[62]

V. L. Thambawita, S. Hicks, P. Halvorsen, and M. Riegler, DivergentNets: Medical image segmentation by network ensemble, in Proc. 3rd Int. Workshop and Challenge on Computer Vision in Endoscopy (EndoCV2021) in conjunction with the 18th IEEE Int. Symp. Biomedical Imaging (ISBI2021), Nice, France, 2021, pp. 27–38.

[63]

X. Guo, C. Yang, and Y. Yuan, Dynamic-weighting hierarchical segmentation network for medical images, Med. Image Anal., vol. 73, p. 102196, 2021.

Crossref Google Scholar

[64]

G. P. Ji, Y. C. Chou, D. P. Fan, G. Chen, H. Fu, D. Jha, and L. Shao, Progressively normalized self-attention network for video polyp segmentation, in Proc. 24th Int. Conf. Medical Image Computing and Computer Assisted Intervention, Strasbourg, France, 2021, pp. 142–152.

[65]

A. Vaswani, N. Shazeer, N. Parmar, J. Uszkoreit, L. Jones, A. N. Gomez, L. Kaiser, and I. Polosukhin, Attention is all you need, in Proc. 31st Conf. Neural Information Processing Systems, Long Beach, CA, USA, 2017, pp. 5998–6008.

[66]

A. Dosovitskiy, L. Beyer, A. Kolesnikov, D. Weissenborn, X. Zhai, T. Unterthiner, M. Dehghani, M. Minderer, G. Heigold, S. Gelly, et al., An image is worth 16×16 words: Transformers for image recognition at scale, in Proc. 9th Int. Conf. Learning Representations, Vienna, Austria, 2021.

[67]

Z. Pan, B. Zhuang, J. Liu, H. He, and J. Cai, Scalable vision transformers with hierarchical pooling, in Proc. 2021 IEEE/CVF International Conference on Computer Vision (ICCV), Montreal, Canada, 2021, pp. 367–376.

[68]

B. Heo, S. Yun, D. Han, S. Chun, J. Choe, and S. J. Oh, Rethinking spatial dimensions of vision transformers, in Proc. 2021 IEEE/CVF International Conference on Computer Vision (ICCV), Montreal, Canada, 2021, pp. 11916–11925.

[69]

L. Yuan, Y. Chen, T. Wang, W. Yu, Y. Shi, Z. Jiang, F. E. H. Tay, J. Feng, and S. Yan, Tokens-to-token ViT: Training vision transformers from scratch on ImageNet, in Proc. 2021 IEEE/CVF Int. Conf. Computer Vision (ICCV), Montreal, Canada, 2021, pp. 538–547.

[70]

K. Han, A. Xiao, E. Wu, J. Guo, C. Xu, and Y. Wang, Transformer in transformer, in Proc. 35th Conf. Neural Information Processing Systems, virtual, 2021, pp. 15908–15919.

[71]

W. Wang, E. Xie, X. Li, D. P. Fan, K. Song, D. Liang, T. Lu, P. Luo, and L. Shao, Pyramid vision transformer: A versatile backbone for dense prediction without convolutions, in Proc. 2021 IEEE/CVF Int. Conf. Computer Vision (ICCV), Montreal, Canada, 2021, pp. 548–558.

[72]

W. Wang, E. Xie, X. Li, D. P. Fan, K. Song, D. Liang, T. Lu, P. Luo, and L. Shao, PVT v2: Improved baselines with pyramid vision transformer, Computational Visual Media, vol. 8, no. 3, pp. 415–424, 2022.

Crossref Google Scholar

[73]

Z. Liu, Y. Lin, Y. Cao, H. Hu, Y. Wei, Z. Zhang, S. Lin, and B. Guo, Swin transformer: Hierarchical vision transformer using shifted windows, in Proc. 2021 IEEE/CVF International Conference on Computer Vision (ICCV), Montreal, Canada, 2021, pp. 9992–10002.

[74]

H. Wu, B. Xiao, N. Codella, M. Liu, X. Dai, L. Yuan, and L. Zhang, CvT: introducing convolutions to vision transformers, in Proc. 2021 IEEE/CVF Int. Conf. Computer Vision (ICCV), Montreal, Canada, 2021, pp. 22–31.

[75]

W. Xu, Y. Xu, T. Chang, and Z. Tu, Co-scale conv-attentional image transformers, in Proc. 2021 IEEE/CVF International Conference on Computer Vision (ICCV), Montreal, Canada, 2021, pp. 9961–9970.

[76]

X. Chu, Z. Tian, Y. Wang, B. Zhang, H. Ren, X. Wei, H. Xia, and C. Shen, Twins: Revisiting the design of spatial attention in vision transformers, in Proc. 34th Conf. Neural Information Processing Systems, virtual, 2021, pp. 9355–9366.

[77]

B. Graham, A. El-Nouby, H. Touvron, P. Stock, A. Joulin, H. Jegou, and M. Douze, LeViT: A vision transformer in ConvNet’s clothing for faster inference, in Proc. 2021 IEEE/CVF Int. Conf. Computer Vision (ICCV). Montreal, Canada, 2021, 12239–12249.

[78]

S. Bhojanapalli, A. Chakrabarti, D. Glasner, D. Li, T. Unterthiner, and A. Veit, Understanding robustness of transformers for image classification, in Proc. 2021 IEEE/CVF Int. Conf. Computer Vision (ICCV), Montreal, Canada, 2021, 10211–10221.

[79]

E. Xie, W. Wang, Z. Yu, A. Anandkumar, J. M. Alvarez, and P. Luo, SegFormer: Simple and efficient design for semantic segmentation with transformers, in Proc. 34th Conf. Neural Information Processing Systems, virtual, 2021, pp. 12077–12090.

[80]

Z. Wu, L. Su, and Q. Huang, Cascaded partial decoder for fast and accurate salient object detection, in Proc. 2019 IEEE/CVF Conf. Computer Vision and Pattern Recognition (CVPR), Long Beach, CA, USA, 2020, pp. 3902–3911.

[81]

S. Ioffe and C. Szegedy, Batch normalization: Accelerating deep network training by reducing internal covariate shift, in Proc. 32nd Int. Conf. Machine Learning (ICML), Lille, France, 2015, pp. 448–456.

[82]

X. Glorot, A. Bordes, and Y. Bengio, Deep sparse rectifier neural networks, in Proc. 14th Int. Conf. Artificial Intelligence and Statistics, Fort Lauderdale, FL, USA, 2011, pp. 315–323.

[83]

S. Woo, J. Park, J. Y. Lee, and I. S. Kweon, CBAM: Convolutional block attention module, in Proc. 15th European Conf. Computer Vision, Munich, Germany, 2018, pp. 3–19.

[84]

J. Hu, L. Shen, and G. Sun, Squeeze-and-excitation networks, in Proc. 2018 IEEE Conf. Computer Vision and Pattern Recognition (CVPR), Salt Lake City, UT, USA, 2018, pp. 7132–7141.

[85]

X. Wang, R. B. Girshick, A. Gupta, and K. He, Non-local neural networks, in Proc. 2018 IEEE Conf. Computer Vision and Pattern Recognition (CVPR), Salt Lake City, UT, USA, 2018, pp. 7794–7803.

[86]

G. Te, Y. Liu, W. Hu, H. Shi, and T. Mei, Edge-aware graph representation learning and reasoning for face parsing, in Proc. 16th European Conf. Computer Vision, Glasgow, UK, 2020, pp. 258–274.

[87]

Y. Lu, Y. Chen, D. Zhao, and J. Chen, Graph-FCN for image semantic segmentation, in Proc. 16th Int. Symp. Neural Networks, Moscow, Russia, 2019, pp. 97–105.

[88]

J. Wei, S. Wang, and Q. Huang, F³Net: Fusion, feedback and focus for salient object detection, in Proc. 34th AAAI Conf. Artificial Intelligence (2020), 32nd Innovative Applications of Artificial Intelligence Conf. (IAAI), 10th AAAI Symp. Educational Advances in Artificial Intelligence (EAAI), New York, NY, USA, 2020, pp. 12321–12328.

[89]

I. Loshchilov and F. Hutter, Decoupled weight decay regularization, in Proc. 7th Int. Conf. Learning Representations (ICLR), New Orleans, LA, USA, 2017.

[90]

F. Milletari, N. Navab, and S. A. Ahmadi, V-net: Fully convolutional neural networks for volumetric medical image segmentation, in Proc. 2016 Fourth Int. Conf. 3D Vision (3DV), Stanford, CA, USA, 2016, pp. 565–571.

[91]

R. Margolin, L. Zelnik-Manor, and A. Tal, How to evaluate foreground maps, in Proc. 2014 IEEE Conf. Computer Vision and Pattern Recognition, Columbus, OH, USA, 2014, pp. 248–255.

[92]

M. M. Cheng and D. P. Fan, Structure-measure: A new way to evaluate foreground maps, Int. J. Comput. Vis., vol. 129, no. 9, pp. 2622–2638, 2021.

Crossref Google Scholar

[93]

D. P. Fan, G. P. Ji, X. B. Qin, M. M. Cheng, Cognitive vision inspired object segmentation metric and loss function, (in Chinese), SCIENTIA SINICA Informat., vol. 51, no. 9, pp. 1475–1489, 2021.

[94]

D. P. Fan, C. Gong, Y. Cao, B. Ren, M. M. Cheng, and A. Borji, Enhanced-alignment measure for binary foreground map evaluation, in Proc. 27th Int. Joint Conf. Artificial Intelligence, Stockholm, Sweden, 2018, pp. 698–704.

[95]

Y. Fang, C. Chen, Y. Yuan, and K. -Y. Tong, Selective feature aggregation network with area-boundary constraints for polyp segmentation, in Proc. 22nd Int. Conf. Medical Image Computing and Computer Assisted Intervention, Shenzhen, China, 2022, pp. 302–310.

[96]

G. P. Ji, G. Xiao, Y. C. Chou, D. P. Fan, K. Zhao, G. Chen, and L. Van Gool, Video polyp segmentation: A deep learning perspective, Mach. Intell. Res., vol. 19, no. 6, pp. 531–549, 2022.

Crossref Google Scholar

[97]

J. Bernal, J. Sánchez, and F. Vilariño, Towards automatic polyp detection with a polyp appearance model, Pattern Recognit., vol. 45, no. 9, pp. 3166–3182, 2012.

Crossref Google Scholar

CAAI Artificial Intelligence Research

Article number: 9150015

DOI: 10.26599/AIR.2023.9150015

Cite this article:

Dong B, Wang W, Fan D-P, et al. Polyp-PVT: Polyp Segmentation with Pyramid Vision Transformers. CAAI Artificial Intelligence Research, 2023, 2: 9150015. https://doi.org/10.26599/AIR.2023.9150015

About Us

Learn about Open Access

Tsinghua University Press

Publish with Us

Peer Review Policy

Copyright and Licensing

Article Processing Charge

Contact Us

Journal Collaboration: Yao Meng (Ms.)✉️ +86-10-83470574

Technical Support: Kuo Zhao (Mr.)✉️ +86-10-83470507

Media Contact: Hao Jin (Mr.)✉️ +86-10-83470559

Address: Floor 6, Tower B, Xueyan Building, Shuangqing Road, Haidian District, Beijing 100084, China.

SciOpen——中国科技期刊卓越行动计划支持项目

Copyright © 2025 Tsinghua University Press Ltd.

京ICP备 10035462号-42 京公网安备11010802044758号