Scene text removal via cascaded text stroke detection and erasing

Xuewei Bian; Chaoqun Wang; Weize Quan; Juntao Ye; Xiaopeng Zhang; Dong-Ming Yan

doi:10.1007/s41095-021-0242-8

AI Chat Paper

Note: Please note that the following content is generated by AMiner AI. SciOpen does not take any responsibility related to this content.

Chat more with AI

| Sign up

Browse by Subject

Search for peer-reviewed journals with full access.

Journals A - Z

About Us

Discover the SciOpen Platform and Achieve Your Research Goals with Ease.

About Us

Publish with Us

Support

Journals A - Z

About Us

Publish with Us

Support

PDF (6.5 MB)

Cite

EndNote(RIS) BibTeX

Collect

Submit Manuscript

AI Chat Paper

Show Outline

Outline

Show full outline

Hide outline

Outline

Show full outline

Hide outline

Research Article | Open Access

Scene text removal via cascaded text stroke detection and erasing

Xuewei Bian^{¹^,²^,^*}, Chaoqun Wang^{¹^,²^,^*}, Weize Quan^{¹^,²}(

), Juntao Ye^{¹^,²}, Xiaopeng Zhang^{¹^,²}, Dong-Ming Yan^{¹^,²}

1National Laboratory of Pattern Recognition, Institute of Automation, Beijing 100049, China

2School of Artificial Intelligence, University of Chinese Academy of Sciences, Beijing 100084, China

^*Xuewei Bian and Chaoqun Wang contributed equally to this work.

Show Author Information

Graphical Abstract

Abstract

Recent learning-based approaches show promising performance improvement for the scene text removal task but usually leave several remnants of text and provide visually unpleasant results. In this work, a novel end-to-end framework is proposed based on accurate text stroke detection. Specifically, the text removal problem is decoupled into text stroke detection and stroke removal; we design separate networks to solve these two subproblems, the latter being a generative network. These two networks are combined as a processing unit, which is cascaded to obtain our final model for text removal. Experimental results demonstrate that the proposed method substantially outperforms the state-of-the-art for locating and erasing scene text. A new large-scale real-world dataset with 12,120 images has been constructed and is being made available to facilitate research, as current publicly available datasets are mainly synthetic so cannot properly measure the performance of different methods.

Keywords

generative adversarial networks scene text removal text stroke detection cascaded network design real-world dataset

References

[1]

Wu, L.; Zhang, C. Q.; Liu, J. M.; Han, J. Y.; Liu, J. T.; Ding, E. R.; Bai, X. Editing text in the wild. In: Proceedings of the 27th ACM International Conference on Multimedia, 1500-1508, 2019.

Crossref

[2]

Khodadadi, M.; Behrad, A. Text localization, extraction and inpainting in color images. In: Proceedings of the 20th Iranian Conference on Electrical Engineering, 1035-1040, 2012.

Crossref

[3]

Modha, U.; Dave, P. Image inpainting-automatic detection and removal of text from images. International Journal of Engineering Research and Applications Vol. 2, No. 2, 930-932, 2012.

Google Scholar

[4]

Wagh, P. D.; Patil, D. R. Text detection and removal from image using inpainting with smoothing. In: Proceedings of the International Conference on Pervasive Computing, 1-4, 2015.

Crossref

[5]

Johnson, J.; Alahi, A.; Li, F. F. Perceptual losses for real-time style transfer and super-resolution. In: Computer Vision - ECCV 2016. Lecture Notes in Computer Science, Vol. 9906. Leibe, B.; Matas, J.; Sebe,N.; Welling, M. Eds. Springer Cham, 694-711, 2016.

Crossref

[6]

Isola, P.; Zhu, J. Y.; Zhou, T. H.; Efros, A. A. Image-to-image translation with conditional adversarial networks. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 5967-5976, 2017.

Crossref

[7]

Zhu, J.-Y.; Park, T.; Isola, P.; Efros, A. A. Unpaired image-to-image translation using cycle-consistent adversarial networks. In: Proceedings of the IEEE International Conference on Computer Vision, 2242-2251, 2017.

Crossref

[8]

Nakamura, T.; Zhu, A.; Yanai, K.; Uchida, S. Scene text eraser. In: Proceedings of the International Conference on Document Analysis and Recognition, 832-837, 2017.

Crossref

[9]

Zhang, S.; Liu, Y.; Jin, L.; Huang, Y.; Lai, S. EnsNet: Ensconce text in the wild. In: Proceedings of the AAAI Conference on Artificial Intelligence, 801-808, 2019.

Crossref

[10]

Tursun, O.; Zeng, R.; Denman, S.; Sivapalan, S.; Sridharan, S.; Fookes, C. MTRNet: A generic scene text eraser. In: Proceedings of the International Conference on Document Analysis and Recognition, 2019.

Crossref

[11]

Tursun, O.; Denman, S.; Zeng, R.; Sivapalan, S.; Sridharan, S.; Fookes, C. MTRNet++: One-stage mask-based scene text eraser. Computer Vision and Image Understanding Vol. 201, 103066, 2020.

Crossref Google Scholar

[12]

Liu, C. Y.; Liu, Y. L.; Jin, L. W.; Zhang, S. T.; Luo, C. J.; Wang, Y. P. EraseNet: End-to-end text removal in the wild. IEEE Transactions on Image Processing Vol. 29, 8760-8775, 2020.

Crossref Google Scholar

[13]

Iizuka, S.; Simo-Serra, E.; Ishikawa, H. Globally and locally consistent image completion. ACM Transactions on Graphics Vol. 36, No. 4, Article No. 107, 2017.

Crossref Google Scholar

[14]

Yu, J. H.; Lin, Z.; Yang, J. M.; Shen, X. H.; Lu, X.; Huang, T. S. Generative image inpainting with contextual attention. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 5505-5514, 2018.

[15]

Ye, Q. X.; Doermann, D. Text detection and recognition in imagery: A survey. IEEE Transactions on Pattern Analysis and Machine Intelligence Vol. 37, No. 7, 1480-1500, 2015.

Crossref Google Scholar

[16]

Shi, B. G.; Bai, X.; Belongie, S. Detecting oriented text in natural images by linking segments. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 3482-3490, 2017.

Crossref

[17]

Liu, Y. L.; Jin, L. W.; Zhang, S. T.; Luo, C. J.; Zhang, S. Curved scene text detection via transverse and longitudinal sequence connection. Pattern Recognition Vol. 90, 337-345, 2019.

Crossref Google Scholar

[18]

Chen, J.; Lian, Z. H.; Wang, Y. Z.; Tang, Y. M.; Xiao, J. G. Irregular scene text detection via attention guided border labeling. Science China Information Sciences Vol. 62, No. 12, 220103, 2019.

Crossref Google Scholar

[19]

He, W. H.; Zhang, X. Y.; Yin, F.; Luo, Z. B.; Ogier, J. M.; Liu, C. L. Realtime multi-scale scene text detection with scale-based region proposal network. Pattern Recognition Vol. 98, 107026, 2020.

Crossref Google Scholar

[20]

Baek, Y.; Lee, B.; Han, D.; Yun, S.; Lee, H. Character region awareness for text detection. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 9357-9366, 2019.

Crossref

[21]

Zhang, C.; Yao, C.; Shi, B.; Bai, X. Automatic discrimination of text and non-text natural images. In: Proceedings of the 13th International Conference on Document Analysis and Recognition, 886-890, 2015.

Crossref

[22]

Matas, J.; Chum, O.; Urban, M.; Pajdla, T. Robust wide-baseline stereo from maximally stable extremal regions. Image and Vision Computing Vol. 22, No. 10, 761-767, 2004.

Crossref Google Scholar

[23]

Joachims, T. Text categorization with Support Vector Machines: Learning with many relevant features. In: Machine Learning: ECML-98. Lecture Notes in Computer Science (Lecture Notes in Artificial Intelligence), Vol. 1398. Nédellec, C.; Rouveirol, C. Eds. Springer Berlin Heidelberg, 137-142, 1998.

Crossref

[24]

Bai, X.; Shi, B. G.; Zhang, C. Q.; Cai, X.; Qi, L. Text/non-text image classification in the wild with convolutional neural networks. Pattern RecognitionVol. 66, 437-446, 2017.

Crossref Google Scholar

[25]

Zhao, M.; Wang, R.-Q.; Yin, F.; Zhang, X.-Y.; Huang, L.-L.; Ogier, J.-M. Fast text/non-text image classification with knowledge distillation. In: Proceedings of the International Conference on Document Analysis and Recognition, 1458-1463, 2019.

Crossref

[26]

Gupta, N.; Jalal, A. S. Text or non-text image classification using fully convolution network (FCN). In: Proceedings of the International Conference on Contemporary Computing and Applications, 150-153, 2020.

Crossref

[27]

Zhou, X.; Yao, C.; Wen, H.; Wang, Y.; Zhou, S.; He, W.; Liang, J. EAST: An efficient and accurate scene text detector. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2642-2651, 2017.

Crossref

[28]

Yu, J. H.; Lin, Z.; Yang, J. M.; Shen, X. H.; Lu, X.; Huang, T. Free-form image inpainting with gated convolution. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, 4470-4479, 2019.

[29]

Ronneberger, O.; Fischer, P.; Brox, T. U-net: Convolutional networks for biomedical image segmentation. In: Medical Image Computing and Computer-Assisted Intervention - MICCAI 2015. Lecture Notes in Computer Science, Vol. 9351. Navab, N.; Hornegger, J.; Wells, W.; Frangi, A. Eds. Springer Cham, 234-241, 2015.

Crossref

[30]

Miyato, T.; Kataoka, T.; Koyama, M.; Yoshida, Y. Spectral normalization for generative adversarial networks. In: Proceedings of the International Conference on Learning Representations, 2018.

[31]

Tran, D.; Ranganath, R.; Blei, D. Hierarchical implicit models and likelihood-free variational inference. In: Proceedings of the Advances in Neural Information Processing Systems, 5523-5533, 2017.

[32]

Zhang, H.; Goodfellow, I.; Metaxas, D.; Odena, A. Self-attention generative adversarial networks. In: Proceedings of the 36th International Conference on Machine Learning, 7354-7363, 2019.

[33]

Gatys, L. A.; Ecker, A. S.; Bethge, M. Image style transfer using convolutional neural networks. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2414-2423, 2016.

Crossref

[34]

Aly, H. A.; Dubois, E. Image up-sampling using total-variation regularization with a new observation model. IEEE Transactions on Image Processing Vol. 14, No.10, 1647-1659, 2005.

Crossref Google Scholar

[35]

Gupta, A.; Vedaldi, A.; Zisserman, A. Synthetic data for text localisation in natural images. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2315-2324, 2016.

Crossref

[36]

Nayef, N.; Yin, F.; Bizid, I.; Choi, H.; Feng, Y.; Karatzas, D.; Luo, Z.; Pal, U.; Rigaud, C.; Chazalon, J. et al. ICDAR2017 robust reading challenge on multi-lingual scene text detection and script identification-RRC-MLT. In: Proceedings of the 14th IAPR International Conference on Document Analysis and Recognition, 1454-1459, 2017.

Crossref

[37]

Dutta, A.; Zisserman, A. The VIA annotation software for images, audio and video. In: Proceedings of the 27th ACM International Conference on Multimedia, 2276-2279, 2019.

Crossref

[38]

Wolf, C.; Jolion, J.-M. Object count/area graphs for the evaluation of object detection and segmentation algorithms. International Journal on Document Analysis and Recognition Vol. 8, No. 4, 280-296, 2006.

Crossref Google Scholar

[39]

Kingma, D. P.; Ba, J. Adam: A method for stochastic optimization. In: Proceedings of the International Conference on Learning Representations, 2015.

Computational Visual Media

Volume 8 Issue 2,
June 2022

Pages 273-287

DOI: 10.1007/s41095-021-0242-8

Cite this article:

Bian X, Wang C, Quan W, et al. Scene text removal via cascaded text stroke detection and erasing. Computational Visual Media, 2022, 8(2): 273-287. https://doi.org/10.1007/s41095-021-0242-8

965

Views

Downloads

Crossref

Web of Science

Scopus

CSCD

Google Scholar
Citation

Altmetrics

Received: 22 February 2021

Accepted: 26 May 2021

Published: 06 December 2021

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduc-tion in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made.

The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.

To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.

Other papers from this open access journal are available free of charge from http://www.springer.com/journal/41095. To submit a manuscript, please go to https://www. editorialmanager.com/cvmj.