A Character Flow Framework for Multi-Oriented Scene Text Detection

Wen-Jun Yang; Bei-Ji Zou; Kai-Wen Li; Shu Liu

doi:10.1007/s11390-021-1362-4

AI Chat Paper

Note: Please note that the following content is generated by AMiner AI. SciOpen does not take any responsibility related to this content.

Chat more with AI

| Sign up

Browse by Subject

Search for peer-reviewed journals with full access.

Journals A - Z

About Us

Discover the SciOpen Platform and Achieve Your Research Goals with Ease.

About Us

Publish with Us

Support

Journals A - Z

About Us

Publish with Us

Support

Article Link

Cite

EndNote(RIS) BibTeX

Collect

Submit Manuscript

Show Outline

Outline

Show full outline

Hide outline

Outline

Show full outline

Hide outline

Regular Paper

A Character Flow Framework for Multi-Oriented Scene Text Detection

Wen-Jun Yang^{¹^,²}, Bei-Ji Zou^{¹^,²}, Kai-Wen Li^{¹^,²}, Shu Liu^{¹^,²}(

)

School of Computer Science and Engineering, Central South University, Changsha 410083, China

Hunan Engineering Research Center of Machine Vision and Intelligent Medicine, Changsha 410083, China

Show Author Information

Abstract

Scene text detection plays a significant role in various applications, such as object recognition, document management, and visual navigation. The instance segmentation based method has been mostly used in existing research due to its advantages in dealing with multi-oriented texts. However, a large number of non-text pixels exist in the labels during the model training, leading to text mis-segmentation. In this paper, we propose a novel multi-oriented scene text detection framework, which includes two main modules: character instance segmentation (one instance corresponds to one character), and character flow construction (one character flow corresponds to one word). We use feature pyramid network (FPN) to predict character and non-character instances with arbitrary directions. A joint network of FPN and bidirectional long short-term memory (BLSTM) is developed to explore the context information among isolated characters, which are finally grouped into character flows. Extensive experiments are conducted on ICDAR2013, ICDAR2015, MSRA-TD500 and MLT datasets to demonstrate the effectiveness of our approach. The F-measures are 92.62%, 88.02%, 83.69% and 77.81%, respectively.

Keywords

multi-oriented scene text detection character instance segmentation character flow feature pyramid network (FPN)bidirectional long short-term memory (BLSTM)

Electronic Supplementary Material

Download File(s)

jcst-36-3-465-Highlights.pdf (174.4 KB)

References

[1]

Liao M H, Shi B G, Bai X, Wang X G, Liu W Y. TextBoxes: A fast text detector with a single deep neural network. In Proc. the 31st AAAI Conference on Artificial Intelligence, February 2017, pp.4161-4167.

Crossref

[2]

Liao M H, Shi B G, Bai X. TextBoxes++: A single-shot oriented scene text detector. IEEE Transactions on Image Processing, 2018, 27(8): 3676-3690. DOI: 10.1109/TIP.2018.2825107.

Crossref Google Scholar

[3]

Liu W, Anguelov D, Erhan D, Szegedy C, Reed S, Fu C Y, Berg A C. SSD: Single shot multiBox detector. In Proc. the 14th European Conference on Computer Vision, October 2016, pp.21-37. DOI: 10.1007/978-3-319-46448-0_2.

Crossref

[4]

Liu Y L, Jin L W. Deep matching prior network: Toward tighter multi-oriented text detection. In Proc. the 2017 IEEE Conference on Computer Vision and Pattern Recognition, July 2017, pp.3454-3461. DOI: 10.1109/CVPR.2017.368.

Crossref

[5]

Ma J Q, Shao W Y, Ye H, Wang L, Wang H, Zheng Y B, Xue X Y. Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia, 2018, 20(11): 3111-3122. DOI: 10.1109/TMM.2018.2818020.

Crossref Google Scholar

[6]

Zhou X Y, Yao C, Wen H, Wang Y Z, Zhou S C, He W R, Liang J J. EAST: An efficient and accurate scene text detector. In Proc. the 2017 IEEE Conference on Computer Vision and Pattern Recognition, July 2017, pp.2642-2651. DOI: 10.1109/CVPR.2017.283.

Crossref

[7]

Long J, Shelhamer E, Darrell T. Fully convolutional networks for semantic segmentation. In Proc. the 2015 IEEE Conference on Computer Vision and Pattern Recognition, June 2015, pp.3431-3440. DOI: 10.1109/CVPR.2015.7298965.

Crossref

[8]

Lyu P Y, Liao M H, Yao C, Wu W H, Bai X. Mask TextSpotter: An end-to-end trainable neural network for spotting text with arbitrary shapes. In Proc. the 15th European Conference on Computer Vision, September 2018, pp.71-88. DOI: 10.1007/978-3-030-01264-9_5.

Crossref

[9]

He K M, Gkioxari G, Dollár P, Girshick R. Mask R-CNN. In Proc. the 2017 IEEE Conference on Computer Vision and Pattern Recognition, October 2017, pp.2980-2988. DOI: 10.1109/ICCV.2017.322.

Crossref

[10]

Wang W H, Xie E Z, Li X, Hou W B, Lu T, Shao S. Shape robust text detection with progressive scale expansion network. In Proc. the 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition, June 2019, pp.9328-9337. DOI: 10.1109/CVPR.2019.00956.

Crossref

[11]

Xie E Z, Zang Y H, Shao S, Yu G, Yao C, Li G Y. Scene text detection with supervised pyramid context network. In Proc. the 33rd AAAI Conference on Artificial Intelligence, January 27–February 1, 2019, pp.9038-9045. DOI: 10.1609/aaai.v33i01.33019038.

Crossref

[12]

Shi B G, Bai X, Belongie S. Detecting oriented text in natural images by linking segments. In Proc. the 2017 IEEE Conference on Computer Vision and Pattern Recognition, July 2017, pp.3482-3490. DOI: 10.1109/CVPR.2017.371.

Crossref

[13]

Deng D, Liu H F, Li X L, Cai D. PixelLink: Detecting scene text via instance segmentation. In Proc. the 32nd AAAI Conference on Artificial Intelligence, February 2018, pp.6773-6780.

Crossref

[14]

Lin T Y, Dollár P, Girshick R, He K M, Hariharan B, Belongie S. Feature pyramid networks for object detection. In Proc. the 2017 IEEE Conference on Computer Vision and Pattern Recognition, July 2017, pp.936-944. DOI: 10.1109/CVPR.2017.106.

Crossref

[15]

Baek Y, Lee B, Han D, Yun S, Lee H. Character region awareness for text detection. In Proc. the 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition, June 2019, pp.9357-9366. DOI: 10.1109/CVPR.2019.00959.

Crossref

[16]

Tian Z, Huang W L, He T, He P, Qiao Y. Detecting text in natural image with connectionist text proposal network. In Proc. the 14th European Conference on Computer Vision, October 2016, pp.56-72. DOI: 10.1007/978-3-319-46484-8_4.

Crossref

[17]

Graves A, Schmidhuber J. Framewise phoneme classification with bidirectional LSTM and other neural network architectures. Neural Networks, 2005, 18(5/6): 602-610. DOI: 10.1016/j.neunet.2005.06.042.

Crossref Google Scholar

[18]

Lyu P Y, Yao C, Wu W H, Yan S C, Bai X. Multi-oriented scene text detection via corner localization and region segmentation. In Proc. the 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, June 2018, pp.7553-7563. DOI: 10.1109/CVPR.2018.00788.

Crossref

[19]

Epshtein B, Ofek E, Wexler Y. Detecting text in natural scenes with stroke width transform. In Proc. the 2010 IEEE Conference on Computer Vision and Pattern Recognition, June 2010, pp.2963-2970. DOI: 10.1109/CVPR.2010.5540041.

Crossref

[20]

Wu H, Zou B J, Zhao Y Q, Guo J J. Scene text detection using adaptive color reduction, adjacent character model and hybrid verification strategy. The Visual Computer, 2017, 33(1): 113-126. DOI: 10.1007/s00371-015-1156-1.

Crossref Google Scholar

[21]

Chen H Z, Tsai S S, Schroth G, Chen D M, Grzeszczuk R, Girod B. Robust text detection in natural images with edge-enhanced maximally stable extremal regions. In Proc. the 2011 IEEE International Conference on Image Processing, September 2011, pp.2609-2612. DOI: 10.1109/ICIP.2011.6116200.

Crossref

[22]

Matas J, Chum O, Urban M, Pajdla T. Robust wide-baseline stereo from maximally stable extremal regions. Image and Vision Computing, 2004, 22(10): 761-767. DOI: 10.1016/j.imavis.2004.02.006.

Crossref Google Scholar

[23]

Yin X C, Yin X W, Huang K Z, Hao H W. Robust text detection in natural scene images. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2013, 36(5): 970-983. DOI: 10.1109/TPAMI.2013.182.

Crossref Google Scholar

[24]

Shi B G, Bai X, Yao C. An end-to-end trainable neural network for image-based sequence recognition and its application to scene text recognition. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2016, 39(11): 2298-2304. DOI: 10.1109/TPAMI.2016.2646371.

Crossref Google Scholar

[25]

Liao M H, Zhu Z, Shi B G, Xia G, Bai X. Rotation-sensitive regression for oriented scene text detection. In Proc. the 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, June 2018, pp.5909-5918. DOI: 10.1109/CVPR.2018.00619.

Crossref

[26]

Zhang Z, Zhang C Q, Shen W, Yao C, Liu W Y, Bai X. Multi-oriented text detection with fully convolutional networks. In Proc. the 2016 IEEE Conference on Computer Vision and Pattern Recognition, June 2016, pp.4159-4167. DOI: 10.1109/CVPR.2016.451.

Crossref

[27]

Long S B, Ruan J Q, Zhang W J, He X, Wu W H, Yao C. TextSnake: A flexible representation for detecting text of arbitrary shapes. In Proc. the 15th European Conference on Computer Vision, September 2018, pp.19-35. DOI: 10.1007/978-3-030-01216-8_2.

Crossref

[28]

Vincent L, Soille P. Watersheds in digital spaces: An efficient algorithm based on immersion simulations. IEEE Transactions on Pattern Analysis and Machine Intelligence, 1991, 13(6): 583-598. DOI: 10.1109/34.87344.

Crossref Google Scholar

[29]

He K M, Zhang X Y, Ren S Q, Sun J. Deep residual learning for image recognition. In Proc. the 2016 IEEE Conference on Computer Vision and Pattern Recognition, June 2016, pp.770-778. DOI: 10.1109/CVPR.2016.90

Crossref

[30]

Gupta A, Vedaldi A, Zisserman A. Synthetic data for text localisation in natural images. In Proc. the 2016 IEEE Conference on Computer Vision and Pattern Recognition, June 2016, pp.2315-2324. DOI: 10.1109/CVPR.2016.254.

Crossref

[31]

Tian S X, Pan Y F, Huang C, Lu S J, Yu K, Tan C L. Text flow: A unified text detection system in natural scene images. In Proc. the 2015 IEEE International Conference on Computer Vision, December 2015, pp.4651-4659. DOI: 10.1109/ICCV.2015.528.

Crossref

[32]

Gers F A, Schraudolph N N, Schmidhuber J. Learning precise timing with LSTM recurrent networks. The Journal of Machine Learning Research, 2002, 3: 115-143. DOI: 10.1162/153244303768966139.

Crossref Google Scholar

[33]

Karatzas D, Shafait F, Uchida S, Iwamura M, Bigorda L G, Mestre S R, Mas J, Mota D F, Almazàn J A, Heras L P. ICDAR 2013 robust reading competition. In Proc. the 12th International Conference on Document Analysis and Recognition, August 2013, pp.1484-1493. DOI: 10.1109/IC-DAR.2013.221.

Crossref

[34]

Karatzas D, Gomez-Bigorda L, Nicolaou A, Ghosh S, Bagdanov A, Iwamura M, Matas J, Neumann L, Chandrasekhar V R, Lu S J, Shafait F, Uchida S, Valveny E. ICDAR 2015 competition on robust reading. In Proc. the 13th International Conference on Document Analysis and Recognition, August 2015, pp.1156-1160. DOI: 10.1109/IC-DAR.2015.7333942.

Crossref

[35]

Yao C, Bai X, Liu W Y, Ma Y, Tu Z W. Detecting texts of arbitrary orientations in natural images. In Proc. the 2012 IEEE Conference on Computer Vision and Pattern Recognition, June 2012, pp.1083-1090. DOI: 10.1109/CVPR.2012.6247787.

Crossref

[36]

Nayef N, Yin F, Bizid I, Choi H, Feng Y, Karatzas D, Luo Z B, Pal U, Rigaud C, Chazalon J, Khlif W, Luqman M M, Burie J C, Liu C L, Ogier J M. ICDAR 2017 robust reading challenge on multi-lingual scene text detection and script identification-RRC-MLT. In Proc. the 14th IAPR International Conference on International Conference on Document Analysis and Recognition, November 2017, pp.1454-1459. DOI: 10.1109/ICDAR.2017.237.

Crossref

[37]

Kingma D P, Ba J. Adam: A method for stochastic optimization. In Proc. the 3rd International Conference on Learning Representations, May 2015.

[38]

Simonyan K, Zisserman A. Very deep convolutional networks for large-scale image recognition. In Proc. the 3rd International Conference on Learning Representations, May 2015.

[39]

Jiang Y Y, Zhu X Y, Wang X B, Yang S L, Li W, Wang H, Fu P, Luo Z B. R2CNN: Rotational region CNN for orientation robust scene text detection. arXiv: 1706.09579, 2017. https://arxiv.org/abs/1706.09579, Apr. 2021.

[40]

He P, Huang W L, He T, Zhu Q L, Qiao Y, Li X L. Single shot text detector with regional attention. In Proc. the 2017 IEEE International Conference on Computer Vision, October 2017, pp.3066-3074. DOI: 10.1109/ICCV.2017.331.

Crossref

[41]

Tian Z T, Shu M, Lyu P Y, Li R Y, Zhou C, Shen X Y, Jia J Y. Learning shape-aware embedding for scene text detection. In Proc. the 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition, June 2019, pp.4234-4243. DOI: 10.1109/CVPR.2019.00436.

Crossref

[42]

Liao M H, Wan Z Y, Yao C, Chen K, Bai X. Real-time scene text detection with differentiable binarization. In Proc. the 34th AAAI Conference on Artificial Intelligence, February 2020, pp.11474-11481. DOI: 10.1609/aaai.v34i07.6812.

Crossref

[43]

Liu X B, Liang D, Yan S, Chen D G, Qiao Y, Yan J J. FOTS: Fast oriented text spotting with a unified network. In Proc. the 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, June 2018, pp.5676-5685. DOI: 10.1109/CVPR.2018.00595.

Crossref

[44]

Zhang S X, Zhu X B, Hou J B, Liu C, Yang C, Wang H F, Yin X C. Deep relational reasoning graph network for arbitrary shape text detection. In Proc. the 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition, June 2020, pp.9696-9705. DOI: 10.1109/CVPR42600.2020.00972.

Crossref

[45]

Zhang C Q, Liang B R, Huang Z M, En M Y, Han J Y, Ding E R, Ding X H. Look more than once: An accurate detector for text of arbitrary shapes. In Proc. the 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition, June 2019, pp.10544-10553. DOI: 10.1109/CVPR.2019.01080.

Crossref

[46]

Li Y, Yu Y J, Li Z F, Lin Y K, Xu M F, Li J W, Zhou X. Pixel-anchor: A fast oriented scene text detector with combined networks. arXiv: 1811.07432, 2018. https://arxiv.org/abs/1811.07432, Apr. 2021.

[47]

Huang Z D, Zhong Z Y, Sun L, Huo Q. Mask R-CNN with pyramid attention network for scene text detection. In Proc. the 2019 IEEE Winter Conference on Applications of Computer Vision, January 2019, pp.764-772. DOI: 10.1109/WACV.2019.00086.

Crossref

[48]

He W H, Zhang X Y, Yin F, Liu C L. Multi-oriented and multi-lingual scene text detection with direct regression. IEEE Transactions on Image Processing, 2018, 27(11): 5406-5419. DOI: 10.1109/TIP.2018.2855399.

Crossref Google Scholar

[49]

Xue C H, Lu S J, Zhan F N. Accurate scene text detection through border semantics awareness and bootstrapping. In Proc. the 15th European Conference on Computer Vision, September 2018, pp.370-387. DOI: 10.1007/978-3-030-01270-0_22.

Crossref

Journal of Computer Science and Technology

Volume 36 Issue 3,
May 2021

Pages 465-477

DOI: 10.1007/s11390-021-1362-4

Cite this article:

Yang W-J, Zou B-J, Li K-W, et al. A Character Flow Framework for Multi-Oriented Scene Text Detection. Journal of Computer Science and Technology, 2021, 36(3): 465-477. https://doi.org/10.1007/s11390-021-1362-4

372

Views

Crossref

Web of Science

Scopus

CSCD

Google Scholar
Citation

Altmetrics

Received: 08 February 2021

Accepted: 28 April 2021

Published: 05 May 2021