AI Chat Paper
Note: Please note that the following content is generated by AMiner AI. SciOpen does not take any responsibility related to this content.
{{lang === 'zh_CN' ? '文章概述' : 'Summary'}}
{{lang === 'en_US' ? '中' : 'Eng'}}
Chat more with AI
PDF (525.3 KB)
Collect
Submit Manuscript AI Chat Paper
Show Outline
Outline
Show full outline
Hide outline
Outline
Show full outline
Hide outline
Open Access

Leveraging Integrated Learning for Open-Domain Chinese Named Entity Recognition

Jin Diao1Zhangbing Zhou1,2( )Guangli Shi1
School of Information Engineering, China University of Geosciences, Beijing 100083, China
TELECOM SudParis, Evry 91011, France
Show Author Information

Abstract

Named entity recognition (NER) is a fundamental technique in natural language processing that provides preconditions for tasks, such as natural language question reasoning, text matching, and semantic text similarity. Compared to English, the challenge of Chinese NER lies in the noise impact caused by the complex meanings, diverse structures, and ambiguous semantic boundaries of the Chinese language itself. At the same time, compared with specific domains, open-domain entity types are more complex and changeable, and the number of entities is considerably larger. Thus, the task of Chinese NER is more difficult. However, existing open-domain NER methods have low recognition rates. Therefore, this paper proposes a method based on the bidirectional long short-term memory conditional random field (BiLSTM-CRF) model, which leverages integrated learning to improve the efficiency of Chinese NER. Compared with single models, including CRF, BiLSTM-CRF, and gated recurrent unit-CRF, the proposed method can significantly improve the accuracy of open-domain Chinese NER.

References

1

Z. Nasar, S. W. Jaffry, and M. K. Malik, Named entity recognition and relation extraction: State-of-the-art, ACM Comput. Surv., vol. 54, no. 1, p. 20, 2022.

2

Y. An, X. Xia, X. Chen, F. X. Wu, and J. Wang, Chinese clinical named entity recognition via multi-head self-attention based BiLSTM-CRF, Artif. Intell. Med., vol. 127, p. 102282, 2022.

3
M. Collins and Y. Singer, Unsupervised models for named entity classification, in Proc. 1999 Joint SIGDAT Conf. on Empirical Methods in Natural Language Processing and Very Large Corpora, College Park, MD, USA, 1999, pp.100−110.
4
S. Cucerzan and D. Yarowsky, Language independent named entity recognition combining morphological and contextual evidence, in Proc. 1999 Joint SIGDAT Conf. on Empirical Methods in Natural Language Processing and Very Large Corpora, College Park, MD, USA, 1999, pp. 90−99.
5

S. Song, N. Zhang, and H. Huang, Named entity recognition based on conditional random fields, Cluster Comput., vol. 22, no. 3, pp. 5195–5206, 2019.

6

G. Wu, G. Tang, Z. Wang, Z. Zhang, and Z. Wang, An attention-based BiLSTM-CRF model for Chinese clinic named entity recognition, IEEE Access, vol. 7, pp. 113942–113949, 2019.

7

J. Lei, B. Tang, X. Lu, K. Gao, M. Jiang, and H. Xu, A comprehensive study of named entity recognition in Chinese clinical text, J. Am. Med. Inform. Assoc., vol. 21, no. 5, pp. 808–814, 2014.

8
C. Xu, F. Wang, J. Han, and C. Li, Exploiting multiple embeddings for Chinese named entity recognition, in Proc. 28th ACM Int. Conf. on Information and Knowledge Management, Atlanta, GA, USA, 2019, pp. 2269−2272.https://doi.org/10.1145/3357384.3358117
9
Z. Huang, W. Xu, and K. Yu, Bidirectional LSTM-CRF models for sequence tagging, arXiv preprint arXiv: 1508.01991, 2015.
10
G. Lample, M. Ballesteros, S. Subramanian, K. Kawakami, and C. Dyer, Neural architectures for named entity recognition, in Proc. 2016 Conf. of the North American Chapter of the Association for Computational, San Diego, CA, USA, 2016, pp. 260−270.https://doi.org/10.18653/v1/N16-1030
11

R. Collobert, J. Weston, L. Bottou, M. Karlen, K. Kavukcuoglu, and P. Kuksa, Natural language processing (almost) from scratch, J. Mach. Learn. Res., vol. 12, pp. 2493–2537, 2011.

12
C. Dong, J. Zhang, C. Zong, M. Hattori, and H. Di, Character-based LSTM-CRF with radical-level features for Chinese named entity recognition, in Proc. 5th CCF Conf. on Natural Language Processing and Chinese Computing, Kunming, China, 2016, pp. 239−250.https://doi.org/10.1007/978-3-319-50496-4_20
13

X. Liu, Y. Zhou, and Z. Wang, Deep neural network-based recognition of entities in Chinese online medical inquiry texts, Future Gener. Comput. Syst., vol. 114, pp. 581–604, 2021.

14

S. Hochreiter and J. Schmidhuber, Long short-term memory, Neural Comput., vol. 9, no. 8, pp. 1735–1780, 1997.

15
X. Ma and E. H. Hovy, End-to-end sequence labeling via bi-directional LSTM-CNNs-CRF, in Proc. 54th Annu. Meeting of the Association for Computational Linguistics, Berlin, Germany, 2016, pp. 1064−1074.https://doi.org/10.18653/v1/P16-1101
16

B. V. Dasarathy and B. V. Sheela, A composite classifier system design: Concepts and methodology, Proc. IEEE, vol. 67, no. 5, pp. 708–713, 1979.

17

E. Hillebrand, M. Lukas, and W. Wei, Bagging weak predictors, Int. J. Forecast., vol. 37, no. 1, pp. 237–254, 2021.

18

L. Breiman, Pasting small votes for classification in large databases and on-line, Mach. Learn., vol. 36, no. 1, pp. 85–103, 1999.

19

L. Bai, J. Liang, and F. Cao, A multiple k-means clustering ensemble algorithm to find nonlinearly separable clusters, Inf. Fusion, vol. 61, pp. 36–47, 2020.

20
T. Hastie, R. Tibshirani, and J. Friedman, The Elements of Statistical Learning: Data Mining, Inference, and Prediction, New York, NY, USA: Springer, 2009.https://doi.org/10.1007/978-0-387-84858-7
21

Y. Cao, Q. Miao, J. Liu, and L. Gao, Advance and prospects of AdaBoost algorithm, Acta Autom. Sin., vol. 39, no. 6, pp. 745–758, 2013.

22
T. N. Rincy and R. Gupta, Ensemble learning techniques and its efficiency in machine learning: A survey, in Proc. 2nd Int. Conf. on Data, Engineering and Applications (IDEA), Bhopal, India, 2020, pp. 1−6.https://doi.org/10.1109/IDEA49133.2020.9170675
23

A. Ledezma, R. Aler, A. Sanchis, and D. Borrajo, GA-stacking: Evolutionary stacked generalization, Intell. Data Anal., vol. 14, no. 1, pp. 89–119, 2010.

24

G. Sigletos, G. Paliouras, C. D. Spyropoulos, and M. Hatzopoulos, Combining information extraction systems using voting and stacked generalization, J. Mach. Learn. Res., vol. 6, pp. 1751–1782, 2005.

International Journal of Crowd Science
Pages 74-79
Cite this article:
Diao J, Zhou Z, Shi G. Leveraging Integrated Learning for Open-Domain Chinese Named Entity Recognition. International Journal of Crowd Science, 2022, 6(2): 74-79. https://doi.org/10.26599/IJCS.2022.9100015

754

Views

45

Downloads

3

Crossref

3

Scopus

Altmetrics

Received: 04 January 2022
Revised: 25 April 2022
Accepted: 26 April 2022
Published: 30 June 2022
© The author(s) 2022

The articles published in this open access journal are distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/).

Return