AI Chat Paper
Note: Please note that the following content is generated by AMiner AI. SciOpen does not take any responsibility related to this content.
{{lang === 'zh_CN' ? '文章概述' : 'Summary'}}
{{lang === 'en_US' ? '中' : 'Eng'}}
Chat more with AI
Article Link
Collect
Submit Manuscript
Show Outline
Outline
Show full outline
Hide outline
Outline
Show full outline
Hide outline
Regular Paper

A Character Flow Framework for Multi-Oriented Scene Text Detection

School of Computer Science and Engineering, Central South University, Changsha 410083, China
Hunan Engineering Research Center of Machine Vision and Intelligent Medicine, Changsha 410083, China
Show Author Information

Abstract

Scene text detection plays a significant role in various applications, such as object recognition, document management, and visual navigation. The instance segmentation based method has been mostly used in existing research due to its advantages in dealing with multi-oriented texts. However, a large number of non-text pixels exist in the labels during the model training, leading to text mis-segmentation. In this paper, we propose a novel multi-oriented scene text detection framework, which includes two main modules: character instance segmentation (one instance corresponds to one character), and character flow construction (one character flow corresponds to one word). We use feature pyramid network (FPN) to predict character and non-character instances with arbitrary directions. A joint network of FPN and bidirectional long short-term memory (BLSTM) is developed to explore the context information among isolated characters, which are finally grouped into character flows. Extensive experiments are conducted on ICDAR2013, ICDAR2015, MSRA-TD500 and MLT datasets to demonstrate the effectiveness of our approach. The F-measures are 92.62%, 88.02%, 83.69% and 77.81%, respectively.

Electronic Supplementary Material

Download File(s)
jcst-36-3-465-Highlights.pdf (174.4 KB)

References

[1]
Liao M H, Shi B G, Bai X, Wang X G, Liu W Y. TextBoxes: A fast text detector with a single deep neural network. In Proc. the 31st AAAI Conference on Artificial Intelligence, February 2017, pp.4161-4167.
[2]

Liao M H, Shi B G, Bai X. TextBoxes++: A single-shot oriented scene text detector. IEEE Transactions on Image Processing, 2018, 27(8): 3676-3690. DOI: 10.1109/TIP.2018.2825107.

[3]
Liu W, Anguelov D, Erhan D, Szegedy C, Reed S, Fu C Y, Berg A C. SSD: Single shot multiBox detector. In Proc. the 14th European Conference on Computer Vision, October 2016, pp.21-37. DOI: 10.1007/978-3-319-46448-0_2.
[4]
Liu Y L, Jin L W. Deep matching prior network: Toward tighter multi-oriented text detection. In Proc. the 2017 IEEE Conference on Computer Vision and Pattern Recognition, July 2017, pp.3454-3461. DOI: 10.1109/CVPR.2017.368.
[5]

Ma J Q, Shao W Y, Ye H, Wang L, Wang H, Zheng Y B, Xue X Y. Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia, 2018, 20(11): 3111-3122. DOI: 10.1109/TMM.2018.2818020.

[6]
Zhou X Y, Yao C, Wen H, Wang Y Z, Zhou S C, He W R, Liang J J. EAST: An efficient and accurate scene text detector. In Proc. the 2017 IEEE Conference on Computer Vision and Pattern Recognition, July 2017, pp.2642-2651. DOI: 10.1109/CVPR.2017.283.
[7]
Long J, Shelhamer E, Darrell T. Fully convolutional networks for semantic segmentation. In Proc. the 2015 IEEE Conference on Computer Vision and Pattern Recognition, June 2015, pp.3431-3440. DOI: 10.1109/CVPR.2015.7298965.
[8]
Lyu P Y, Liao M H, Yao C, Wu W H, Bai X. Mask TextSpotter: An end-to-end trainable neural network for spotting text with arbitrary shapes. In Proc. the 15th European Conference on Computer Vision, September 2018, pp.71-88. DOI: 10.1007/978-3-030-01264-9_5.
[9]
He K M, Gkioxari G, Dollár P, Girshick R. Mask R-CNN. In Proc. the 2017 IEEE Conference on Computer Vision and Pattern Recognition, October 2017, pp.2980-2988. DOI: 10.1109/ICCV.2017.322.
[10]
Wang W H, Xie E Z, Li X, Hou W B, Lu T, Shao S. Shape robust text detection with progressive scale expansion network. In Proc. the 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition, June 2019, pp.9328-9337. DOI: 10.1109/CVPR.2019.00956.
[11]
Xie E Z, Zang Y H, Shao S, Yu G, Yao C, Li G Y. Scene text detection with supervised pyramid context network. In Proc. the 33rd AAAI Conference on Artificial Intelligence, January 27–February 1, 2019, pp.9038-9045. DOI: 10.1609/aaai.v33i01.33019038.
[12]
Shi B G, Bai X, Belongie S. Detecting oriented text in natural images by linking segments. In Proc. the 2017 IEEE Conference on Computer Vision and Pattern Recognition, July 2017, pp.3482-3490. DOI: 10.1109/CVPR.2017.371.
[13]
Deng D, Liu H F, Li X L, Cai D. PixelLink: Detecting scene text via instance segmentation. In Proc. the 32nd AAAI Conference on Artificial Intelligence, February 2018, pp.6773-6780.
[14]
Lin T Y, Dollár P, Girshick R, He K M, Hariharan B, Belongie S. Feature pyramid networks for object detection. In Proc. the 2017 IEEE Conference on Computer Vision and Pattern Recognition, July 2017, pp.936-944. DOI: 10.1109/CVPR.2017.106.
[15]
Baek Y, Lee B, Han D, Yun S, Lee H. Character region awareness for text detection. In Proc. the 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition, June 2019, pp.9357-9366. DOI: 10.1109/CVPR.2019.00959.
[16]
Tian Z, Huang W L, He T, He P, Qiao Y. Detecting text in natural image with connectionist text proposal network. In Proc. the 14th European Conference on Computer Vision, October 2016, pp.56-72. DOI: 10.1007/978-3-319-46484-8_4.
[17]

Graves A, Schmidhuber J. Framewise phoneme classification with bidirectional LSTM and other neural network architectures. Neural Networks, 2005, 18(5/6): 602-610. DOI: 10.1016/j.neunet.2005.06.042.

[18]
Lyu P Y, Yao C, Wu W H, Yan S C, Bai X. Multi-oriented scene text detection via corner localization and region segmentation. In Proc. the 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, June 2018, pp.7553-7563. DOI: 10.1109/CVPR.2018.00788.
[19]
Epshtein B, Ofek E, Wexler Y. Detecting text in natural scenes with stroke width transform. In Proc. the 2010 IEEE Conference on Computer Vision and Pattern Recognition, June 2010, pp.2963-2970. DOI: 10.1109/CVPR.2010.5540041.
[20]

Wu H, Zou B J, Zhao Y Q, Guo J J. Scene text detection using adaptive color reduction, adjacent character model and hybrid verification strategy. The Visual Computer, 2017, 33(1): 113-126. DOI: 10.1007/s00371-015-1156-1.

[21]
Chen H Z, Tsai S S, Schroth G, Chen D M, Grzeszczuk R, Girod B. Robust text detection in natural images with edge-enhanced maximally stable extremal regions. In Proc. the 2011 IEEE International Conference on Image Processing, September 2011, pp.2609-2612. DOI: 10.1109/ICIP.2011.6116200.
[22]

Matas J, Chum O, Urban M, Pajdla T. Robust wide-baseline stereo from maximally stable extremal regions. Image and Vision Computing, 2004, 22(10): 761-767. DOI: 10.1016/j.imavis.2004.02.006.

[23]

Yin X C, Yin X W, Huang K Z, Hao H W. Robust text detection in natural scene images. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2013, 36(5): 970-983. DOI: 10.1109/TPAMI.2013.182.

[24]

Shi B G, Bai X, Yao C. An end-to-end trainable neural network for image-based sequence recognition and its application to scene text recognition. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2016, 39(11): 2298-2304. DOI: 10.1109/TPAMI.2016.2646371.

[25]
Liao M H, Zhu Z, Shi B G, Xia G, Bai X. Rotation-sensitive regression for oriented scene text detection. In Proc. the 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, June 2018, pp.5909-5918. DOI: 10.1109/CVPR.2018.00619.
[26]
Zhang Z, Zhang C Q, Shen W, Yao C, Liu W Y, Bai X. Multi-oriented text detection with fully convolutional networks. In Proc. the 2016 IEEE Conference on Computer Vision and Pattern Recognition, June 2016, pp.4159-4167. DOI: 10.1109/CVPR.2016.451.
[27]
Long S B, Ruan J Q, Zhang W J, He X, Wu W H, Yao C. TextSnake: A flexible representation for detecting text of arbitrary shapes. In Proc. the 15th European Conference on Computer Vision, September 2018, pp.19-35. DOI: 10.1007/978-3-030-01216-8_2.
[28]

Vincent L, Soille P. Watersheds in digital spaces: An efficient algorithm based on immersion simulations. IEEE Transactions on Pattern Analysis and Machine Intelligence, 1991, 13(6): 583-598. DOI: 10.1109/34.87344.

[29]
He K M, Zhang X Y, Ren S Q, Sun J. Deep residual learning for image recognition. In Proc. the 2016 IEEE Conference on Computer Vision and Pattern Recognition, June 2016, pp.770-778. DOI: 10.1109/CVPR.2016.90
[30]
Gupta A, Vedaldi A, Zisserman A. Synthetic data for text localisation in natural images. In Proc. the 2016 IEEE Conference on Computer Vision and Pattern Recognition, June 2016, pp.2315-2324. DOI: 10.1109/CVPR.2016.254.
[31]
Tian S X, Pan Y F, Huang C, Lu S J, Yu K, Tan C L. Text flow: A unified text detection system in natural scene images. In Proc. the 2015 IEEE International Conference on Computer Vision, December 2015, pp.4651-4659. DOI: 10.1109/ICCV.2015.528.
[32]

Gers F A, Schraudolph N N, Schmidhuber J. Learning precise timing with LSTM recurrent networks. The Journal of Machine Learning Research, 2002, 3: 115-143. DOI: 10.1162/153244303768966139.

[33]
Karatzas D, Shafait F, Uchida S, Iwamura M, Bigorda L G, Mestre S R, Mas J, Mota D F, Almazàn J A, Heras L P. ICDAR 2013 robust reading competition. In Proc. the 12th International Conference on Document Analysis and Recognition, August 2013, pp.1484-1493. DOI: 10.1109/IC-DAR.2013.221.
[34]
Karatzas D, Gomez-Bigorda L, Nicolaou A, Ghosh S, Bagdanov A, Iwamura M, Matas J, Neumann L, Chandrasekhar V R, Lu S J, Shafait F, Uchida S, Valveny E. ICDAR 2015 competition on robust reading. In Proc. the 13th International Conference on Document Analysis and Recognition, August 2015, pp.1156-1160. DOI: 10.1109/IC-DAR.2015.7333942.
[35]
Yao C, Bai X, Liu W Y, Ma Y, Tu Z W. Detecting texts of arbitrary orientations in natural images. In Proc. the 2012 IEEE Conference on Computer Vision and Pattern Recognition, June 2012, pp.1083-1090. DOI: 10.1109/CVPR.2012.6247787.
[36]
Nayef N, Yin F, Bizid I, Choi H, Feng Y, Karatzas D, Luo Z B, Pal U, Rigaud C, Chazalon J, Khlif W, Luqman M M, Burie J C, Liu C L, Ogier J M. ICDAR 2017 robust reading challenge on multi-lingual scene text detection and script identification-RRC-MLT. In Proc. the 14th IAPR International Conference on International Conference on Document Analysis and Recognition, November 2017, pp.1454-1459. DOI: 10.1109/ICDAR.2017.237.
[37]
Kingma D P, Ba J. Adam: A method for stochastic optimization. In Proc. the 3rd International Conference on Learning Representations, May 2015.
[38]
Simonyan K, Zisserman A. Very deep convolutional networks for large-scale image recognition. In Proc. the 3rd International Conference on Learning Representations, May 2015.
[39]
Jiang Y Y, Zhu X Y, Wang X B, Yang S L, Li W, Wang H, Fu P, Luo Z B. R2CNN: Rotational region CNN for orientation robust scene text detection. arXiv: 1706.09579, 2017. https://arxiv.org/abs/1706.09579, Apr. 2021.
[40]
He P, Huang W L, He T, Zhu Q L, Qiao Y, Li X L. Single shot text detector with regional attention. In Proc. the 2017 IEEE International Conference on Computer Vision, October 2017, pp.3066-3074. DOI: 10.1109/ICCV.2017.331.
[41]
Tian Z T, Shu M, Lyu P Y, Li R Y, Zhou C, Shen X Y, Jia J Y. Learning shape-aware embedding for scene text detection. In Proc. the 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition, June 2019, pp.4234-4243. DOI: 10.1109/CVPR.2019.00436.
[42]
Liao M H, Wan Z Y, Yao C, Chen K, Bai X. Real-time scene text detection with differentiable binarization. In Proc. the 34th AAAI Conference on Artificial Intelligence, February 2020, pp.11474-11481. DOI: 10.1609/aaai.v34i07.6812.
[43]
Liu X B, Liang D, Yan S, Chen D G, Qiao Y, Yan J J. FOTS: Fast oriented text spotting with a unified network. In Proc. the 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, June 2018, pp.5676-5685. DOI: 10.1109/CVPR.2018.00595.
[44]
Zhang S X, Zhu X B, Hou J B, Liu C, Yang C, Wang H F, Yin X C. Deep relational reasoning graph network for arbitrary shape text detection. In Proc. the 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition, June 2020, pp.9696-9705. DOI: 10.1109/CVPR42600.2020.00972.
[45]
Zhang C Q, Liang B R, Huang Z M, En M Y, Han J Y, Ding E R, Ding X H. Look more than once: An accurate detector for text of arbitrary shapes. In Proc. the 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition, June 2019, pp.10544-10553. DOI: 10.1109/CVPR.2019.01080.
[46]
Li Y, Yu Y J, Li Z F, Lin Y K, Xu M F, Li J W, Zhou X. Pixel-anchor: A fast oriented scene text detector with combined networks. arXiv: 1811.07432, 2018. https://arxiv.org/abs/1811.07432, Apr. 2021.
[47]
Huang Z D, Zhong Z Y, Sun L, Huo Q. Mask R-CNN with pyramid attention network for scene text detection. In Proc. the 2019 IEEE Winter Conference on Applications of Computer Vision, January 2019, pp.764-772. DOI: 10.1109/WACV.2019.00086.
[48]

He W H, Zhang X Y, Yin F, Liu C L. Multi-oriented and multi-lingual scene text detection with direct regression. IEEE Transactions on Image Processing, 2018, 27(11): 5406-5419. DOI: 10.1109/TIP.2018.2855399.

[49]
Xue C H, Lu S J, Zhan F N. Accurate scene text detection through border semantics awareness and bootstrapping. In Proc. the 15th European Conference on Computer Vision, September 2018, pp.370-387. DOI: 10.1007/978-3-030-01270-0_22.
Journal of Computer Science and Technology
Pages 465-477
Cite this article:
Yang W-J, Zou B-J, Li K-W, et al. A Character Flow Framework for Multi-Oriented Scene Text Detection. Journal of Computer Science and Technology, 2021, 36(3): 465-477. https://doi.org/10.1007/s11390-021-1362-4

372

Views

1

Crossref

1

Web of Science

1

Scopus

0

CSCD

Altmetrics

Received: 08 February 2021
Accepted: 28 April 2021
Published: 05 May 2021
©Institute of Computing Technology, Chinese Academy of Sciences 2021
Return