Discover the SciOpen Platform and Achieve Your Research Goals with Ease.
Search articles, authors, keywords, DOl and etc.
As the core task of information extraction, named entity recognition recognizes various types of named entities from the text. The task of Chinese-named entity recognition has benefited from the application of deep learning in character vocabulary representation, feature extraction, and other aspects, achieving rich results. However, this task still faces the challenge of a lack of vocabulary information, which has been regarded as one of the primary impediments to the development of a high-performance Chinese-named entity recognition (NER) system. While the automatically constructed dictionary contains rich lexical boundary information and lexical semantic information, the integration of word knowledge in the Chinese NER task still faces challenges, such as the effective integration of the semantic information of self-matching words and their context information into Chinese characters. Furthermore, although graph neural networks can be used to extract feature information from various Chinese character-vocabulary interaction diagrams in feature extraction, the challenge of how to fuse features based on the importance of the information from the respective interaction diagrams into the original input sequence is yet to be solved.
This paper proposes a Chinese-oriented entity recognition method of Chinese-vocabulary combination sequence. (1) First, this method proposes a Chinese-vocabulary combination sequence embedding structure that primarily uses self-matching words to replace the Chinese characters in the Chinese character sequence under consideration. To make complete use of the self-matching vocabulary information, we also constructed a sequence for the self-matching vocabulary and vectorized the vocabulary and Chinese characters. At the coding level, we obtained the context information of the Chinese character sequence, the vocabulary sequence, and the Chinese-word combination sequence using the BiLSTM model and then fused the information from the words in the Chinese word combination sequence into the corresponding words in the vocabulary sequence. Furthermore, the graph neural network was used to extract the features of different Chinese-vocabulary interaction diagrams so that the enhanced vocabulary information can be integrated into Chinese characters, which can not only make complete use of the vocabulary boundary information but also integrate the context information of the self-matching vocabulary sequence into characters while capturing the semantic information between the Chinese characters and words, further enriching the character features. Finally, the conditional random field was used to decode and label the entities. (2) Considering the importance of different Chinese character-word interaction diagram information to the original input Chinese character sequence is not the same, this method proposes a multigraph attention fusion structure. It assigns a score to the correlation of the Chinese character sequence based on different Chinese character-word interaction diagram information, differentiates between structural features based on their importance, and fuses different Chinese character-word interaction diagram information into the Chinese character sequence based on their proportions.
The F1 value of the new method was higher than that of the original method on Weibo, Resume, OntoNotes4.0, and MSRA data by 3.17% (Weibo_all), 1.21%, 1.33%, and 0.43%, respectively, thus verifying the feasibility of the new method on Chinese NER tasks.
The experiment revealed that the proposed method is more effective than the original method.
JU S G, LI T N, SUN J P. Chinese fine-grained name entity recognition based on associated memory networks[J]. Journal of Software, 2021, 32(8): 2545-2556. (in Chinese)
YE Y X, XUE H, WANG L, et al. Distant supervision neural network relation extraction base on noisy observation[J]. Journal of Software, 2020, 31(4): 1025-1038. (in Chinese)
HE R F, DUAN S Y. Joint Chinese event extraction based multi-task learning[J]. Journal of Software, 2019, 30(4): 1015-1030. (in Chinese)
YANG D H, HE T, WANG H Z, et al. Survey on knowledge graph embedding learning[J]. Journal of Software, 2022, 33(9): 3370-3390. (in Chinese)
WANG X, ZOU L, WANG C K, et al. Research on knowledge graph data management: A survey[J]. Journal of Software, 2019, 30(7): 2139-2174. (in Chinese)
HU B, GENG T Y, DENG G, et al. Faster biomedical named entity recognition based on knowledge distillation[J]. Journal of Tsinghua University (Science and Technology), 2021, 61(9): 936-942. (in Chinese)
TAN H Y, ZHENG J H, LIU K Y. Research on method of automatic recognition of Chinese place name based on transformation[J]. Journal of Software, 2001, 12(11): 1608-1613. (in Chinese)
TSAI T H, WU S H, LEE C W, et al. Mencius: A Chinese named entity recognizer using the maximum entropy-based hybrid model[J]. IJCLCLP, 2004, 9(1): 65-82.
YIN X Z, ZHAO H, ZHAO J B, et al. Multi-neural network collaboration for Chinese military named entity recognition[J]. Journal of Tsinghua University (Science and Technology), 2020, 60(8): 648-655. (in Chinese)
CHIU J P C, NICHOLS E. Named entity recognition with bidirectional LSTM-CNNs[J]. Transactions of the Association for Computational Linguistics, 2016, 4: 357-370.
COLLOBERT R, WESTON J, BOTTOU L, et al. Natural language processing (almost) from scratch[J]. Journal of Machine Learning Research, 2011, 12: 2493-2537.
GRAVES A. Supervised sequence labelling with recurrent neural networks[M]. Berlin, Germany: Springer, 2012.
VITERBI A. Error bounds for convolutional codes and an asymptotically optimum decoding algorithm[J]. IEEE Transactions on Information Theory, 1967, 13(2): 260-269.
SRIVASTAVA N, HINTON G, KRIZHEVSKY A, et al. Dropout: A simple way to prevent neural networks from overfitting[J]. The Journal of Machine Learning Research, 2014, 15(1): 1929-1958.