Converse Attention Knowledge Transfer for Low-Resource Named Entity Recognition

Shengfei Lyu^¹(), Linghao Sun^¹, Huixiong Yi^¹, Yong Liu^², Huanhuan Chen^¹, Chunyan Miao^²

1School of Computer Science and Technology, University of Science and Technology of China, Hefei 230027, China

2School of Computer Science and Engineering, Nanyang Technological University, Singapore 639798, Singapore

Show Author Information

Abstract

In recent years, great success has been achieved in many tasks of natural language processing (NLP), e.g., named entity recognition (NER), especially in the high-resource language, i.e., English, thanks in part to the considerable amount of labeled resources. More labeled resources, better word representations. However, most low-resource languages do not have such an abundance of labeled data as high-resource English, leading to poor performance of NER in these low-resource languages due to poor word representations. In the paper, we propose converse attention network (CAN) to augment word representations in low-resource languages from the high-resource language, improving the performance of NER in low-resource languages by transferring knowledge learned in the high-resource language. CAN first translates sentences in low-resource languages into high-resource English using an attention-based translation module. In the process of translation, CAN obtains the attention matrices that align word representations of high-resource language space and low-resource language space. Furthermore, CAN augments word representations learned in low-resource language space with word representations learned in high-resource language space using the attention matrices. Experiments on four low-resource NER datasets show that CAN achieves consistent and significant performance improvements, which indicates the effectiveness of CAN.

Keywords

named entity recognition (NER)low-resource NER converse attention network knowledge transfer transfer learning

References

[1]

S. Lyu, X. Zhou, X. Wu, Q. Chen, and H. Chen, Self-attention over tree for relation extraction with data-efficiency and computational efficiency, IEEE Trans. Emerg. Top. Comput. Intell., doi: 10.1109/TETCI.2023.3286268.

Language	Dataset	Number
Language	Dataset	Train	Development	Test
German	CoNLL-2003	12 705	3068	3160
Spanish	CoNLL-2002	8323	1915	1517
Chinese	OntoNotes 4.0	15 509	4405	4462
Chinese	Weibo	1350	270	270

Method	F1 (%)
Method	German	Spanish
Ref. [49]	–	82.95
Ref. [50]	78.76	85.75
Ref. [48]	88.32	–
BiLSTM+CRF	81.41	82.49
+ ${C A N}_{f i r s t}$	82.23	82.73
+ ${C A N}_{l a s t}$	82.45	84.23
CharLM+BiLSTM+CRF	88.21	87.33
+ ${C A N}_{f i r s t}$	88.20	87.43
+ ${C A N}_{l a s t}$	88.41	88.16

Input	Method	P (%)	R (%)	F1 (%)
Gold Seg	Ref. [52]	65.59	71.84	68.57
	Ref. [52]	72.98	80.15	76.40
	Ref. [14]	77.71	72.51	75.02
	Ref. [53]	76.43	72.32	74.32
No Seg	Ref. [26]	76.35	71.56	73.88
	Char+BiLSTM+CRF	69.51	53.17	60.25
	+ ${C A N}_{f i r s t}$	74.54	61.09	67.15
	+ ${C A N}_{l a s t}$	75.74	68.59	71.99
	+BERT+ ${C A N}_{f i r s t}$	78.12	78.36	78.24
	+BERT+ ${C A N}_{l a s t}$	80.42	82.02	81.21

Method	NE			NM			Overall
Method	P (%)	R (%)	F1 (%)	P (%)	R (%)	F1 (%)	P (%)	R (%)	F1 (%)
Peng and Dredze^[54]	66.67	47.22	55.28	74.48	54.55	62.97	–	–	58.99
He and Sun^[55]	61.68	48.82	54.50	74.13	53.54	62.17	–	–	58.23
Zhang and Yang^[26]	–	–	53.04	–	–	62.25	–	–	58.79
Char+BiLSTM+CRF	60.55	22.07	32.35	58.70	21.91	31.91	59.55	22.05	32.18
+ ${C A N}_{f i r s t}$	61.15	23.67	34.13	58.62	25.12	35.17	60.73	24.25	34.66
+ ${C A N}_{l a s t}$	60.72	31.57	41.54	58.81	30.62	40.27	61.21	30.96	41.12
BERT+BiLSTM+CRF	72.97	67.71	70.24	76.96	61.73	68.51	74.52	65.17	69.53
+ ${C A N}_{f i r s t}$	72.99	68.85	70.86	77.19	62.66	69.17	74.30	66.37	70.11
+ ${C A N}_{l a s t}$	73.50	71.45	72.46	77.01	64.99	70.49	74.17	69.91	71.98

Embedding	P (%)	R (%)	F1 (%)
Random	36.12	10.55	16.33
FastText	59.55	22.05	32.18
${C A N}_{f i r s t}$	50.01	23.66	32.12
${C A N}_{l a s t}$	57.22	26.09	35.84