Article Link
Collect
Submit Manuscript
Show Outline
Outline
Abstract
Keywords
Electronic Supplementary Material
References
Show full outline
Hide outline
Regular Paper

Chinese Word Segmentation via BiLSTM+Semi-CRF with Relay Node

School of Computer Science, Fudan University, Shanghai 200433, China
Shanghai Key Laboratory of Intelligent Information Processing, Fudan University, Shanghai 200433, China

#Contributed Equally to the Paper

Show Author Information

Abstract

Semi-Markov conditional random fields (Semi-CRFs) have been successfully utilized in many segmentation problems, including Chinese word segmentation (CWS). The advantage of Semi-CRF lies in its inherent ability to exploit properties of segments instead of individual elements of sequences. Despite its theoretical advantage, Semi-CRF is still not the best choice for CWS because its computation complexity is quadratic to the sentence’s length. In this paper, we propose a simple yet effective framework to help Semi-CRF achieve comparable performance with CRF-based models under similar computation complexity. Specifically, we first adopt a bi-directional long short-term memory (BiLSTM) on character level to model the context information, and then use simple but effective fusion layer to represent the segment information. Besides, to model arbitrarily long segments within linear time complexity, we also propose a new model named Semi-CRF-Relay. The direct modeling of segments makes the combination with word features easy and the CWS performance can be enhanced merely by adding publicly available pre-trained word embeddings. Experiments on four popular CWS datasets show the effectiveness of our proposed methods. The source codes and pre-trained embeddings of this paper are available on https://github.com/fastnlp/fastNLP/.

Electronic Supplementary Material

Download File(s)
jcst-35-5-1115-Highlights.pdf (540.9 KB)

References

[1]

Xue N. Chinese word segmentation as character tagging. International Journal of Computational Linguistics and Chinese Language Processing, 2003, 8(1): 29-48.

[2]
Lafferty J D, McCallum A, Pereira F C N. Conditional random fields: Probabilistic models for segmenting and labeling sequence data. In Proc. the 18th International Conference on Machine Learning, June 2001, pp.282-289.
[3]
Zheng X, Chen H, Xu T. Deep learning for Chinese word segmentation and POS tagging. In Proc. the 2013 Conference on Empirical Methods in Natural Language Processing, October 2013, pp.647-657.
[4]
Pei W, Ge T, Chang B. Max-margin tensor neural network for Chinese word segmentation. In Proc. the 52nd Annual Meeting of the Association for Computational Linguistics, June 2014, pp.293-303.
[5]
Chen X, Qiu X, Zhu C, Liu P, Huang X. Long short-term memory neural networks for Chinese word segmentation. In Proc. the 2015 Conference on Empirical Methods in Natural Language Processing, September 2015, pp.1197-1206.
[6]
Chen X, Qiu X, Zhu C, Huang X. Gated recursive neural network for Chinese word segmentation. In Proc. the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing of the Asian Federation of Natural Language Processing, July 2015, pp.1744-1753.
[7]
Zhang Y, Clark S. Chinese segmentation with a word-based perceptron algorithm. In Proc. the 45th Annual Meeting of the Association for Computational Linguistics, June 2007, pp.840-847.
[8]
Sun W. Word-based and character-based word segmentation models: Comparison and combination. In Proc. the 23rd International Conference on Computational Linguistics, August 2010, pp.1211-1219.
[9]
Cai D, Zhao H. Neural word segmentation learning for Chinese. In Proc. the 54th Annual Meeting of the Association for Computational Linguistics, August 2016, pp.409-420.
[10]
Zhang M, Zhang Y, Fu G. Transition-based neural word segmentation. In Proc. the 54th Annual Meeting of the Association for Computational Linguistics, August 2016, pp.421-431.
[11]
Liu Y, Che W, Guo J, Qin B, Liu T. Exploring segment representations for neural segmentation models. In Proc. the 25th International Joint Conference on Artificial Intelligence, July 2016, pp.2880-2886.
[12]
Sarawagi S, Cohen W. Semi-Markov conditional random fields for information extraction. In Proc. the Annual Conference on Neural Information Processing Systems, December 2005, pp.1185-1192.
[13]
Andrew G. A hybrid Markov/semi-Markov conditional random field for sequence segmentation. In Proc. the 2006 Conference on Empirical Methods in Natural Language Processing, July 2006, pp.465-472.
[14]
Sun X, Zhang Y, Matsuzaki T, Tsuruoka Y, Tsujii J. A discriminative latent variable Chinese segmenter with hybrid word/character information. In Proc. the 2009 Annual Conference of the North American Chapter of the Association for Computational Linguistics, May 2009, pp.56-64.
[15]
Kong L, Dyer C, Smith N A. Segmental recurrent neural networks. In Proc. the 4th International Conference on Learning Representations, May 2015.
[16]

Hochreiter S, Schmidhuber J. Long short-term memory. Neural Computation, 1997, 9(8): 1735-1780.

[17]
Chen X, Shi Z, Qiu X, Huang X. Adversarial multi-criteria learning for Chinese word segmentation. In Proc. the 55th Annual Meeting of the Association for Computational Linguistics, July 2017, pp.1193-1203.
[18]
Chen X, Shi Z, Qiu X, Huang X. DAG-based long short-term memory for neural word segmentation. arXiv:1707.00248, 2017. https://arxiv.org/abs/1707.00248, August 2019.
[19]
Yang J, Zhang Y, Liang S. Subword encoding in Lattice LSTM for Chinese word segmentation. arXiv:1810.12594, 2018. https://arxiv.org/abs/1810.12594, August 2019.
[20]

Elman J L. Finding structure in time. Cognitive Science, 1990, 14(2): 179-211.

[21]
Song Y, Shi S, Li J, Zhang H. Directional skip-gram: Explicitly distinguishing left and right context for word embeddings. In Proc. the 2018 Conference of the North American Chapter of the Association for Computational Linguistics, June 2018, pp.175-180.
[22]
Emerson T. The second international Chinese word segmentation bakeoff. In Proc. the 4th SIGHAN Workshop on Chinese Language Processing, June 2005, pp.123-133.
[23]
Zeiler M D. ADADELTA: An adaptive learning rate method. arXiv:1212.5701, 2012. https://arxiv.org/abs/1212.5701, August 2019.
[24]

Srivastava N, Hinton G, Krizhevsky A, Sutskever I, Salakhutdinov R. Dropout: A simple way to prevent neural networks from overfitting. The Journal of Machine Learning Research, 2014, 15(1): 1929-1958.

[25]
Glorot X, Bengio Y. Understanding the difficulty of training deep feedforward neural networks. In Proc. the 13th International Conference on Artificial Intelligence and Statistics, May 2010, pp.249-256.
[26]
Ling W, Dyer C, Black A W, Trancoso I. Two/too simple adaptations of word2vec for syntax problems. In Proc. the 2015 Conference of the North American Chapter of the Association for Computational Linguistics, May 2015, pp.1299-1304.
[27]
Zhang Q, Liu X, Fu J. Neural networks incorporating dictionaries for Chinese word segmentation. In Proc. the 32nd AAAI Conference on Artificial Intelligence, February 2018, pp.5682-5689.
[28]
Finkel J R, Manning C D. Nested named entity recognition. In Proc. the 2009 Conference on Empirical Methods in Natural Language Processing, August 2009, pp.141-150.
[29]
Ye Z, Ling Z. Hybrid semi-Markov CRF for neural sequence labeling. In Proc. the 56th Annual Meeting of the Association for Computational Linguistics, July 2018, pp.235-240.
[30]

Sun X, Huang D, Song H, Ren F. Chinese new word identification: A latent discriminative model with global features. Journal of Computer Science and Technology, 2011, 26(1): 14-24.

Journal of Computer Science and Technology
Pages 1115-1126
Cite this article:
Qun N, Yan H, Qiu X-P, et al. Chinese Word Segmentation via BiLSTM+Semi-CRF with Relay Node. Journal of Computer Science and Technology, 2020, 35(5): 1115-1126. https://doi.org/10.1007/s11390-020-9576-4
Metrics & Citations  
Article History
Copyright
Return