AI Chat Paper
Note: Please note that the following content is generated by AMiner AI. SciOpen does not take any responsibility related to this content.
{{lang === 'zh_CN' ? '文章概述' : 'Summary'}}
{{lang === 'en_US' ? '中' : 'Eng'}}
Chat more with AI
PDF (663.4 KB)
Collect
Submit Manuscript AI Chat Paper
Show Outline
Outline
Show full outline
Hide outline
Outline
Show full outline
Hide outline
Open Access

Converse Attention Knowledge Transfer for Low-Resource Named Entity Recognition

Shengfei Lyu1( )Linghao Sun1Huixiong Yi1Yong Liu2Huanhuan Chen1Chunyan Miao2
School of Computer Science and Technology, University of Science and Technology of China, Hefei 230027, China
School of Computer Science and Engineering, Nanyang Technological University, Singapore 639798, Singapore
Show Author Information

Abstract

In recent years, great success has been achieved in many tasks of natural language processing (NLP), e.g., named entity recognition (NER), especially in the high-resource language, i.e., English, thanks in part to the considerable amount of labeled resources. More labeled resources, better word representations. However, most low-resource languages do not have such an abundance of labeled data as high-resource English, leading to poor performance of NER in these low-resource languages due to poor word representations. In the paper, we propose converse attention network (CAN) to augment word representations in low-resource languages from the high-resource language, improving the performance of NER in low-resource languages by transferring knowledge learned in the high-resource language. CAN first translates sentences in low-resource languages into high-resource English using an attention-based translation module. In the process of translation, CAN obtains the attention matrices that align word representations of high-resource language space and low-resource language space. Furthermore, CAN augments word representations learned in low-resource language space with word representations learned in high-resource language space using the attention matrices. Experiments on four low-resource NER datasets show that CAN achieves consistent and significant performance improvements, which indicates the effectiveness of CAN.

References

[1]
S. Lyu, X. Zhou, X. Wu, Q. Chen, and H. Chen, Self-attention over tree for relation extraction with data-efficiency and computational efficiency, IEEE Trans. Emerg. Top. Comput. Intell., doi: 10.1109/TETCI.2023.3286268.
[2]
S. Lyu and H. Chen, Relation classification with entity type restriction, in Proc. Findings of the Association for Computational Linguistics: ACL-IJCNLP 2021, Online, 2021, pp. 390–395.
[3]

S. Lyu, J. Cheng, X. Wu, L. Cui, H. Chen, and C. Miao, Auxiliary learning for relation extraction, IEEE Trans. Emerg. Top. Comput. Intell., vol. 6, no. 1, pp. 182–191, 2022.

[4]

S. Lyu, X. Wu, J. Li, Q. Chen, and H. Chen, Do models learn the directionality of relations? A new evaluation: Relation direction recognition, IEEE Trans. Emerg. Top. Comput. Intell., vol. 6, no. 4, pp. 883–892, 2022.

[5]

G. Hu, S. Lyu, X. Wu, J. Li, and H. Chen, Contextual-aware information extractor with adaptive objective for Chinese medical dialogues, ACM Trans. Asian Low Resour. Lang. Inf. Process., vol. 21, no. 5, p. 96, 2022.

[6]
X. Zhao, F. Xiao, H. Zhong, J. Yao, and H. Chen, Condition aware and revise transformer for question answering, in Proc. Web Conf. 2020, Taipei, China, 2020, pp. 2377–2387.
[7]

X. Zhao, L. Chen, and H. Chen, A weighted heterogeneous graph-based dialog system, IEEE Trans. Neural Netw. Learn. Syst., vol. 34, no. 8, pp. 5212–5217, 2023.

[8]
Z. Huang, W. Xu, and K. Yu, Bidirectional LSTM-CRF models for sequence tagging, arXiv preprint arXiv: 1508.01991, 2015.
[9]
X. Feng, X. Feng, B. Qin, Z. Feng, and T. Liu, Improving low resource named entity recognition using cross-lingual knowledge transfer, in Proc. 27th Int. Joint Conf. Artificial Intelligence, Stockholm, Sweden, 2018, pp. 4071–4077.
[10]
E. F. T. K. Sang and F. D. Meulder, Introduction to the CoNLL-2003 shared task: Language-independent named entity recognition, in Proc. 7th Conf. Natural Language Learning at HLT-NAACL 2003, Edmonton, Canada, 2003, pp. 142–147.
[11]
A. Akbik, T. Bergmann, and R. Vollgraf, Pooled contextualized embeddings for named entity recognition, in Proc. 2019 Conf. North. American Chapter of the Association for Computational Linguistics: Human Language Technologies, Minneapolis, MN, USA, 2019, pp. 724–728.
[12]
B. Zhang, X. Pan, T. Wang, A. Vaswani, H. Ji, K. Knight, and D. Marcu, Name tagging for low-resource incident languages based on expectation-driven learning, in Proc. 2016 Conf. North American Chapter of the Association for Computational Linguistics: Human Language Technologies, San Diego, CA, USA, 2016, pp. 249–259.
[13]

M. Wang and C. D. Manning, Cross-lingual projected expectation regularization for weakly supervised learning, Trans. Assoc. Comput. Linguist., vol. 2, pp. 55–66, 2014.

[14]
W. Che, M. Wang, C. D. Manning, and T. Liu, Named entity recognition with bilingual constraints, in Proc. 2013 Conf. North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Atlanta, GA, USA, 2013, pp. 52–62.
[15]
A. Graves and J. Schmidhuber, Framewise phoneme classification with bidirectional LSTM and other neural network architectures, Neural Netw., vol. 18, nos. 5&6, pp. 602–610, 2005.
[16]
A. Bharadwaj, D. Mortensen, C. Dyer, and J. Carbonell, Phonologically aware neural model for named entity recognition in low resource transfer settings, in Proc. 2016 Conf. Empirical Methods in Natural Language Processing, Austin, TX, USA, 2016, pp. 1462–1472.
[17]
J. Gehring, M. Auli, D. Grangier, D. Yarats, and Y. N. Dauphin, Convolutional sequence to sequence learning, in Proc. 34th Int. Conf. Machine Learning, Sydney, Australia, 2017, pp. 1243–1252.
[18]
R. Florian, A. Ittycheriah, H. Jing, and T. Zhang, Named entity recognition through classifier combination, in Proc. 7th Conf. Natural Language Learning at HLT-NAACL 2003, Edmonton, Canada, 2003, pp. 168–171.
[19]
J. D. Lafferty, A. McCallum, and F. C. N. Pereira, Conditional random fields: Probabilistic models for segmenting and labeling sequence data, in Proc. 18th Int. Conf. Machine Learning, Williamstown, MA, USA, 2001, pp. 282–289.
[20]
J. Hammerton, Named entity recognition with long short-term memory, in Proc. 7th Conf. Natural Language Learning at HLT-NAACL 2003, Edmonton, Canada, 2003, pp. 172–175.
[21]
M. Peters, W. Ammar, C. Bhagavatula, and R. Power, Semi-supervised sequence tagging with bidirectional language models, in Proc. 55th Annual Meeting of the Association for Computational Linguistics, Vancouver, Canada, 2017, pp. 1756–1765.
[22]

R. Collobert, J. Weston, L. Bottou, M. Karlen, K. Kavukcuoglu, and P. Kuksa, Natural language processing (almost) from scratch, J. Machine Learning Research, vol. 12, pp. 2493–2537, 2011.

[23]
X. Ma and E. Hovy, End-to-end sequence labeling via bi-directional LSTM-CNNs-CRF, in Proc. 54th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), Berlin, Germany, 2016, pp. 1064–1074
[24]
E. Strubell, P. Verga, D. Belanger, and A. McCallum, Fast and accurate entity recognition with iterated dilated convolutions, in Proc. 2017 Conf. Empirical Methods in Natural Language Processing, Copenhagen, Denmark, 2017, pp. 2670–2680.
[25]
C. D. Santos and V. Guimaraes, Boosting named entity recognition with neural character embeddings, in Proc 5th Named Entities Workshop, Beijing, China, 2015, pp. 25–30.
[26]
Y. Zhang and J. Yang, Chinese NER using lattice LSTM, in Proc. 56th Annual Meeting of the Association for Computational Linguistics, Melbourne, Australia, 2018, pp. 1554–1564.
[27]
D. Das and S. Petrov, Unsupervised part-of-speech tagging with bilingual graph-based projections, in Proc. 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies, Portland, OR, USA, 2011, pp. 600–609.
[28]

O. Täckström, D. Das, S. Petrov, R. McDonald, and J. Nivre, Token and type constraints for cross-lingual part-of-speech tagging, Trans. Assoc. Comput. Linguist., vol. 1, pp. 1–12, 2013.

[29]

R. Hwa, P. Resnik, A. Weinberg, C. Cabezas, and O. Kolak, Bootstrapping parsers via syntactic projection across parallel texts, Nat. Lang. Eng., vol. 11, no. 3, pp. 311–325, 2005.

[30]
J. Tiedemann, Rediscovering annotation projection for cross-lingual parser induction, in Proc. COLING 2014, the 25th Int. Conf. Computational Linguistics: Technical Papers, Dublin, Ireland, 2014, pp. 1854–1864.
[31]
D. Yarowsky, G. Ngai, and R. Wicentowski, Inducing multilingual text analysis tools via robust projection across aligned corpora, in Proc. 1st Int. Conf. Human Language Technology Research, San Diego, CA, USA, 2001, pp. 1–8.
[32]
I. Zitouni and R. Florian, Mention detection crossing the language barrier, in Proc. Conf. Empirical Methods in Natural Language Processing, Honolulu, HI, USA, 2008, pp. 600–609.
[33]
M. Ehrmann, M. Turchi, and R. Steinberger, Building a multilingual named entity-annotated corpus using annotation projection, in Proc. Int. Conf. Recent Advances in Natural Language Processing 2011, Hissar, Bulgaria, 2011, pp. 118–124.
[34]
O. Täckström, R. McDonald, and J. Uszkoreit, Crosslingual word clusters for direct transfer of linguistic structure, in Proc. 2012 Conf. North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Montréal, Canada, 2012, pp. 477–487.
[35]
J. Ni, G. Dinu, and R. Florian, Weakly supervised cross-lingual named entity recognition via effective annotation and representation projection, in Proc. 55th Annual Meeting of the Association for Computational Linguistics, Vancouver, Canada, 2017, pp. 1470–1480.
[36]
A. Chaudhary, C. Zhou, L. Levin, G. Neubig, D. R. Mortensen, and J. Carbonell, Adapting word embeddings to new languages with morphological and phonological subword representations, in Proc. 2018 Conf. Empirical Methods in Natural Language Processing, Brussels, Belgium, 2018, pp. 3285–3295.
[37]
A. Zirikly and M. Hagiwara, Cross-lingual transfer of named entity recognizers without parallel corpora, in Proc. 53rd Annual Meeting of the Association for Computational Linguistics and the 7th Int. Joint Conf. Natural Language Processing (Volume 2: Short Papers), Beijing, China, 2015, pp. 390–396.
[38]
M. Fang and T. Cohn, Model transfer for tagging low-resource languages using a bilingual dictionary, in Proc. 55th Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers), Vancouver, Canada, 2017, pp. 587–593.
[39]
S. Kim, K. Toutanova, and H. Yu, Multilingual named entity recognition using parallel data and metadata from Wikipedia, in Proc. 50th Annual Meeting of the Association for Computational Linguistics: Long Papers - Volume 1, Jeju Island, Republic of Korea, 2012, pp. 694–702.
[40]
C. T. Tsai, S. Mayhew, and D. Roth, Cross-lingual named entity recognition via wikification, in Proc. 20th SIGNLL Conf. Computational Natural Language Learning, Berlin, Germany, 2016, pp. 219–228.
[41]
Y. Lin, S. Yang, V. Stoyanov, and H. Ji, A multi-lingual multi-task architecture for low-resource sequence labeling, in Proc. 56th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), Melbourne, Australia, 2018, pp. 799–809.
[42]
J. Pennington, R. Socher, and C. Manning, Glove: Global vectors for word representation, in Proc. 2014 Conf. Empirical Methods in Natural Language Processing (EMNLP), Doha, Qatar, 2014, pp. 1532–1543.
[43]
E. F. T. K. Sang, Introduction to the CoNLL-2002 shared task: Language-independent named entity recognition, in Proc. 6th Conf. Natural Language Learning, Taipei, China, 2002, pp. 155–158.
[44]
R. Weischedel, M. Palmer, M. Marcus, E. Hovy, S. Pradhan, L. Ramshaw, N. Xue, A. Taylor, J. Kaufman, M. Franchini, et al., Ontonotes release 4.0, https://catalog.ldc.upenn.edu/LDC2011T03, 2023.
[45]
N. Peng and M. Dredze, Named entity recognition for Chinese social media with jointly trained embeddings, in Proc. 2015 Conf. Empirical Methods in Natural Language Processing, Lisbon, Portugal, 2015, pp. 548–554.
[46]
J. Devlin, M. W. Chang, K. Lee, and K. Toutanova, BERT: Pre-training of deep bidirectional transformers for language understanding, in Proc. 2019 Conf. North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Minneapolis, MN, USA, 2019, pp. 4171–4186.
[47]
N. Reimers and I. Gurevych, Reporting score distributions makes a difference: Performance study of LSTM-networks for sequence tagging, in Proc. 2017 Conf. Empirical Methods in Natural Language Processing, Copenhagen, Denmark, 2017, pp. 338–348.
[48]
A. Akbik, D. Blythe, and R. Vollgraf, Contextual string embeddings for sequence labeling, in Proc. 27th Int. Conf. Computational Linguistics, Santa Fe, NM, USA, 2018, pp. 1638–1649.
[49]
D. Gillick, C. Brunk, O. Vinyals, and A. Subramanya, Multilingual language processing from bytes, in Proc. 2016 Conf. North American Chapter of the Association for Computational Linguistics: Human Language Technologies, San Diego, CA, USA, 2016, pp. 1296–1306.
[50]
G. Lample, M. Ballesteros, S. Subramanian, K. Kawakami, and C. Dyer, Neural architectures for named entity recognition, in. Proc. 2016 Conf. North American Chapter of the Association for Computational Linguistics: Human Language Technologies, San Diego, CA, USA, 2016, pp. 260–270.
[51]

S. Chaudhari, V. Mithal, G. Polatkan, and R. Ramanath, An attentive survey of attention models, ACM Trans. Intell. Syst. Technol., vol. 12, no. 5, pp. 1–32, 2021.

[52]
J. Yang, Z. Teng, M. Zhang, and Y. Zhang, Combining discrete and neural features for sequence labeling, in Proc. 17th Int. Conf. Intelligent Text Processing and Computational Linguistics, Konya, Turkey, 2018, pp. 140–154.
[53]

M. Wang, W. Che, and C. Manning, Effective bilingual constraints for semi-supervised learning of named entity recognizers, Proc. AAAI Conf. Artif. Intell., vol. 27, no. 1, pp. 919–925, 2013.

[54]
N. Peng and M. Dredze, Improving named entity recognition for Chinese social media with word segmentation representation learning, in Proc. 54th Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers), Berlin, Germany, 2016, pp. 149–155.
[55]

H. He and X. Sun, A unified model for cross-domain and semi-supervised named entity recognition in Chinese social media, Proc. AAAI Conf. Artif. Intell., vol. 31, no. 1, pp. 3216–3222, 2017.

[56]

P. Bojanowski, E. Grave, A. Joulin, and T. Mikolov, Enriching word vectors with subword information, Trans. Assoc. Comput. Linguist., vol. 5, pp. 135–146, 2017.

[57]
N. F. Liu, M. Gardner, Y. Belinkov, M. E. Peters, and N. A. Smith, Linguistic knowledge and transferability of contextual representations, in Proc. 2019 Conf. North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Minneapolis, MN, USA, 2019, pp. 1073–1094.
International Journal of Crowd Science
Pages 140-148
Cite this article:
Lyu S, Sun L, Yi H, et al. Converse Attention Knowledge Transfer for Low-Resource Named Entity Recognition. International Journal of Crowd Science, 2024, 8(3): 140-148. https://doi.org/10.26599/IJCS.2023.9100014

94

Views

5

Downloads

0

Crossref

0

Scopus

Altmetrics

Received: 10 January 2023
Revised: 22 July 2023
Accepted: 03 August 2023
Published: 19 August 2024
© The author(s) 2024.

The articles published in this open access journal are distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/).

Return