AI Chat Paper
Note: Please note that the following content is generated by AMiner AI. SciOpen does not take any responsibility related to this content.
{{lang === 'zh_CN' ? '文章概述' : 'Summary'}}
{{lang === 'en_US' ? '中' : 'Eng'}}
Chat more with AI
PDF (6.5 MB)
Collect
Submit Manuscript AI Chat Paper
Show Outline
Outline
Show full outline
Hide outline
Outline
Show full outline
Hide outline
Research Article | Open Access

Scene text removal via cascaded text stroke detection and erasing

National Laboratory of Pattern Recognition, Institute of Automation, Beijing 100049, China
School of Artificial Intelligence, University of Chinese Academy of Sciences, Beijing 100084, China

*Xuewei Bian and Chaoqun Wang contributed equally to this work.

Show Author Information

Graphical Abstract

Abstract

Recent learning-based approaches show promising performance improvement for the scene text removal task but usually leave several remnants of text and provide visually unpleasant results. In this work, a novel end-to-end framework is proposed based on accurate text stroke detection. Specifically, the text removal problem is decoupled into text stroke detection and stroke removal; we design separate networks to solve these two subproblems, the latter being a generative network. These two networks are combined as a processing unit, which is cascaded to obtain our final model for text removal. Experimental results demonstrate that the proposed method substantially outperforms the state-of-the-art for locating and erasing scene text. A new large-scale real-world dataset with 12,120 images has been constructed and is being made available to facilitate research, as current publicly available datasets are mainly synthetic so cannot properly measure the performance of different methods.

References

[1]
Wu, L.; Zhang, C. Q.; Liu, J. M.; Han, J. Y.; Liu, J. T.; Ding, E. R.; Bai, X. Editing text in the wild. In: Proceedings of the 27th ACM International Conference on Multimedia, 1500-1508, 2019.
[2]
Khodadadi, M.; Behrad, A. Text localization, extraction and inpainting in color images. In: Proceedings of the 20th Iranian Conference on Electrical Engineering, 1035-1040, 2012.
[3]
Modha, U.; Dave, P. Image inpainting-automatic detection and removal of text from images. International Journal of Engineering Research and Applications Vol. 2, No. 2, 930-932, 2012.
[4]
Wagh, P. D.; Patil, D. R. Text detection and removal from image using inpainting with smoothing. In: Proceedings of the International Conference on Pervasive Computing, 1-4, 2015.
[5]
Johnson, J.; Alahi, A.; Li, F. F. Perceptual losses for real-time style transfer and super-resolution. In: Computer Vision - ECCV 2016. Lecture Notes in Computer Science, Vol. 9906. Leibe, B.; Matas, J.; Sebe,N.; Welling, M. Eds. Springer Cham, 694-711, 2016.
[6]
Isola, P.; Zhu, J. Y.; Zhou, T. H.; Efros, A. A. Image-to-image translation with conditional adversarial networks. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 5967-5976, 2017.
[7]
Zhu, J.-Y.; Park, T.; Isola, P.; Efros, A. A. Unpaired image-to-image translation using cycle-consistent adversarial networks. In: Proceedings of the IEEE International Conference on Computer Vision, 2242-2251, 2017.
[8]
Nakamura, T.; Zhu, A.; Yanai, K.; Uchida, S. Scene text eraser. In: Proceedings of the International Conference on Document Analysis and Recognition, 832-837, 2017.
[9]
Zhang, S.; Liu, Y.; Jin, L.; Huang, Y.; Lai, S. EnsNet: Ensconce text in the wild. In: Proceedings of the AAAI Conference on Artificial Intelligence, 801-808, 2019.
[10]
Tursun, O.; Zeng, R.; Denman, S.; Sivapalan, S.; Sridharan, S.; Fookes, C. MTRNet: A generic scene text eraser. In: Proceedings of the International Conference on Document Analysis and Recognition, 2019.
[11]
Tursun, O.; Denman, S.; Zeng, R.; Sivapalan, S.; Sridharan, S.; Fookes, C. MTRNet++: One-stage mask-based scene text eraser. Computer Vision and Image Understanding Vol. 201, 103066, 2020.
[12]
Liu, C. Y.; Liu, Y. L.; Jin, L. W.; Zhang, S. T.; Luo, C. J.; Wang, Y. P. EraseNet: End-to-end text removal in the wild. IEEE Transactions on Image Processing Vol. 29, 8760-8775, 2020.
[13]
Iizuka, S.; Simo-Serra, E.; Ishikawa, H. Globally and locally consistent image completion. ACM Transactions on Graphics Vol. 36, No. 4, Article No. 107, 2017.
[14]
Yu, J. H.; Lin, Z.; Yang, J. M.; Shen, X. H.; Lu, X.; Huang, T. S. Generative image inpainting with contextual attention. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 5505-5514, 2018.
[15]
Ye, Q. X.; Doermann, D. Text detection and recognition in imagery: A survey. IEEE Transactions on Pattern Analysis and Machine Intelligence Vol. 37, No. 7, 1480-1500, 2015.
[16]
Shi, B. G.; Bai, X.; Belongie, S. Detecting oriented text in natural images by linking segments. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 3482-3490, 2017.
[17]
Liu, Y. L.; Jin, L. W.; Zhang, S. T.; Luo, C. J.; Zhang, S. Curved scene text detection via transverse and longitudinal sequence connection. Pattern Recognition Vol. 90, 337-345, 2019.
[18]
Chen, J.; Lian, Z. H.; Wang, Y. Z.; Tang, Y. M.; Xiao, J. G. Irregular scene text detection via attention guided border labeling. Science China Information Sciences Vol. 62, No. 12, 220103, 2019.
[19]
He, W. H.; Zhang, X. Y.; Yin, F.; Luo, Z. B.; Ogier, J. M.; Liu, C. L. Realtime multi-scale scene text detection with scale-based region proposal network. Pattern Recognition Vol. 98, 107026, 2020.
[20]
Baek, Y.; Lee, B.; Han, D.; Yun, S.; Lee, H. Character region awareness for text detection. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 9357-9366, 2019.
[21]
Zhang, C.; Yao, C.; Shi, B.; Bai, X. Automatic discrimination of text and non-text natural images. In: Proceedings of the 13th International Conference on Document Analysis and Recognition, 886-890, 2015.
[22]
Matas, J.; Chum, O.; Urban, M.; Pajdla, T. Robust wide-baseline stereo from maximally stable extremal regions. Image and Vision Computing Vol. 22, No. 10, 761-767, 2004.
[23]
Joachims, T. Text categorization with Support Vector Machines: Learning with many relevant features. In: Machine Learning: ECML-98. Lecture Notes in Computer Science (Lecture Notes in Artificial Intelligence), Vol. 1398. Nédellec, C.; Rouveirol, C. Eds. Springer Berlin Heidelberg, 137-142, 1998.
[24]
Bai, X.; Shi, B. G.; Zhang, C. Q.; Cai, X.; Qi, L. Text/non-text image classification in the wild with convolutional neural networks. Pattern RecognitionVol. 66, 437-446, 2017.
[25]
Zhao, M.; Wang, R.-Q.; Yin, F.; Zhang, X.-Y.; Huang, L.-L.; Ogier, J.-M. Fast text/non-text image classification with knowledge distillation. In: Proceedings of the International Conference on Document Analysis and Recognition, 1458-1463, 2019.
[26]
Gupta, N.; Jalal, A. S. Text or non-text image classification using fully convolution network (FCN). In: Proceedings of the International Conference on Contemporary Computing and Applications, 150-153, 2020.
[27]
Zhou, X.; Yao, C.; Wen, H.; Wang, Y.; Zhou, S.; He, W.; Liang, J. EAST: An efficient and accurate scene text detector. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2642-2651, 2017.
[28]
Yu, J. H.; Lin, Z.; Yang, J. M.; Shen, X. H.; Lu, X.; Huang, T. Free-form image inpainting with gated convolution. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, 4470-4479, 2019.
[29]
Ronneberger, O.; Fischer, P.; Brox, T. U-net: Convolutional networks for biomedical image segmentation. In: Medical Image Computing and Computer-Assisted Intervention - MICCAI 2015. Lecture Notes in Computer Science, Vol. 9351. Navab, N.; Hornegger, J.; Wells, W.; Frangi, A. Eds. Springer Cham, 234-241, 2015.
[30]
Miyato, T.; Kataoka, T.; Koyama, M.; Yoshida, Y. Spectral normalization for generative adversarial networks. In: Proceedings of the International Conference on Learning Representations, 2018.
[31]
Tran, D.; Ranganath, R.; Blei, D. Hierarchical implicit models and likelihood-free variational inference. In: Proceedings of the Advances in Neural Information Processing Systems, 5523-5533, 2017.
[32]
Zhang, H.; Goodfellow, I.; Metaxas, D.; Odena, A. Self-attention generative adversarial networks. In: Proceedings of the 36th International Conference on Machine Learning, 7354-7363, 2019.
[33]
Gatys, L. A.; Ecker, A. S.; Bethge, M. Image style transfer using convolutional neural networks. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2414-2423, 2016.
[34]
Aly, H. A.; Dubois, E. Image up-sampling using total-variation regularization with a new observation model. IEEE Transactions on Image Processing Vol. 14, No.10, 1647-1659, 2005.
[35]
Gupta, A.; Vedaldi, A.; Zisserman, A. Synthetic data for text localisation in natural images. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2315-2324, 2016.
[36]
Nayef, N.; Yin, F.; Bizid, I.; Choi, H.; Feng, Y.; Karatzas, D.; Luo, Z.; Pal, U.; Rigaud, C.; Chazalon, J. et al. ICDAR2017 robust reading challenge on multi-lingual scene text detection and script identification-RRC-MLT. In: Proceedings of the 14th IAPR International Conference on Document Analysis and Recognition, 1454-1459, 2017.
[37]
Dutta, A.; Zisserman, A. The VIA annotation software for images, audio and video. In: Proceedings of the 27th ACM International Conference on Multimedia, 2276-2279, 2019.
[38]
Wolf, C.; Jolion, J.-M. Object count/area graphs for the evaluation of object detection and segmentation algorithms. International Journal on Document Analysis and Recognition Vol. 8, No. 4, 280-296, 2006.
[39]
Kingma, D. P.; Ba, J. Adam: A method for stochastic optimization. In: Proceedings of the International Conference on Learning Representations, 2015.
Computational Visual Media
Pages 273-287
Cite this article:
Bian X, Wang C, Quan W, et al. Scene text removal via cascaded text stroke detection and erasing. Computational Visual Media, 2022, 8(2): 273-287. https://doi.org/10.1007/s41095-021-0242-8

965

Views

43

Downloads

16

Crossref

10

Web of Science

16

Scopus

0

CSCD

Altmetrics

Received: 22 February 2021
Accepted: 26 May 2021
Published: 06 December 2021
© The Author(s) 2021.

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduc-tion in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made.

The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.

To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.

Other papers from this open access journal are available free of charge from http://www.springer.com/journal/41095. To submit a manuscript, please go to https://www. editorialmanager.com/cvmj.

Return