Improving Few-Shot Named Entity Recognition with Causal Interventions

Zhen Yang^¹, Yongbin Liu^²(), Chunping Ouyang^², Shu Zhao^³, Chi Zhu^²

1School of Computer, University of South China, Hengyang 421000, China, and also with School of Computer Science and Technology, Anhui University, Hefei 230601, China

2School of Computer, University of South China, Hengyang 421000, China

3School of Computer Science and Technology, Anhui University, Hefei 230601, China

Show Author Information

Abstract

Few-shot Named Entity Recognition (NER) systems are designed to identify new categories of entities with a limited number of labeled examples. A major challenge encountered by these systems is overfitting, particularly pronounced in comparison to tasks with ample samples. This overfitting predominantly stems from spurious correlations, a consequence of biases inherent in the selection of a small sample set. In response to this challenge, we introduce a novel approach in this paper: a causal intervention-based method for few-shot NER. Building upon the foundational structure of prototypical networks, our method strategically intervenes in the context to obstruct the indirect association between the context and the label. For scenarios restricted to 1-shot, where contextual intervention is not feasible, our method utilizes incremental learning to intervene at the prototype level. This not only counters overfitting but also aids in alleviating catastrophic forgetting. Additionally, to preliminarily classify entity types, we employ entity detection methods for coarse categorization. Considering the distinct characteristics of the source and target domains in few-shot tasks, we introduce sample reweighting to aid in model transfer and generalization. Through rigorous testing across multiple benchmark datasets, our approach consistently sets new state-of-the-art benchmarks, underscoring its efficacy in few-shot NER applications.

Keywords

few-shot Named Entity Recognition (NER)causal inference prototypical network

References

[1]

Z. Yang, Y. Liu, and C. Ouyang, Causal intervention-based few-shot named entity recognition, in Proc. EMNLP 2023, doi: 10.18653/v1/2023.findings-emnlp.1046.

Model	1−2-shot		5−10-shot
Model	5-way	10-way	5-way	10-way
DecomposedMetaNER	45.06	39.55	55.08	48.26
ours	71.93	60.50	68.21	60.86

Name	Value
Epoch	20
batch_size	32
Max_length	32
Learning rate	0.0001
Dropout	0.1
embedding_size	768

Model	Average
DecomposedMetaNER	57.15
TadNER	67.73
HEProto	61.62
NuNER	66.36
MSDP	68.49
Ours	72.25

Setting	Model	We	Mu	Pl	Bo	Se	Re	Cr	Average
1-shot	TransferBERT	54.73±0.17	38.23±1.11	45.91±1.76	32.07±2.32	22.01±3.55	41.17±0.24	38.32±5.27	38.92±2.06
	MN+BERT	22.04±3.98	11.02±0.99	40.00±2.02	57.94±1.10	24.13±0.98	33.01±0.54	69.71±1.45	36.84±1.58
	ProtoBERT	48.91±0.99	40.96±0.74	52.01±1.88	69.14±1.19	61.77±0.73	55.64±1.23	67.48±0.69	56.56±1.06
	L-TapNet+CDT	71.84±3.96	61.01±0.86	66.54±1.88	84.97±1.23	75.94±1.36	70.81±1.77	62.77±1.94	70.55±1.85
	ESD	75.46±0.94	55.23±0.99	71.74±1.83	71.81±1.69	68.04±0.88	71.47±1.25	78.47±0.87	70.32±1.20
	Ours	73.30±1.17	53.56±0.17	88.98±0.44	84.62±0.53	70.22±0.39	73.52±0.61	76.57±0.28	74.40±0.51
5-shot	TransferBERT	60.01±0.58	41.97±1.94	46.28±3.65	21.23±3.43	29.57±1.23	67.84±0.93	58.94±3.68	46.55±2.20
	MN+BERT	37.08±4.01	34.87±5.49	53.09±2.79	68.77±3.43	39.06±5.57	34.29±3.14	71.97±1.87	48.45±3.57
	ProtoBERT	68.41±3.94	56.77±1.83	45.79±2.02	71.69±2.13	74.01±1.87	61.04±4.31	67.21±3.04	63.56±2.73
	Retriever	83.49±0.77	62.76±1.42	72.49±2.11	82.76±3.37	72.94±0.18	80.43±2.73	50.19±0.99	72.15±1.65
	ConVEx	72.06±0.98	78.04±0.71	78.16±3.34	84.25±3.16	83.19±2.46	74.37±0.97	68.06±1.34	76.88±1.85
	Ma2021	90.07±0.17	74.37±2.22	76.19±0.97	84.37±1.67	74.76±2.44	83.16±0.83	71.94±0.55	79.27±1.26
	L-TapNet+CDT	72.07±4.34	66.97±3.05	76.17±0.94	84.76±3.17	83.16±1.94	70.89±0.93	74.66±3.11	75.53±2.49
	ESD	85.01±2.73	67.77±1.48	80.55±0.78	83.14±1.11	81.78±0.33	79.91±1.13	80.77±0.24	79.85±1.11
	Ours	94.23±0.47	91.00±0.61	91.21±0.33	94.82±1.03	94.67±1.13	93.41±1.86	89.98±0.57	92.76±0.85

Setting	Entity detection	Context-based intervention	Sample reweighting	Prototype-based intervention	F1
1−2-shot	√	$\times$	√	√	73.72
	√	$\times$	$\times$	√	69.70
	√	$\times$	√	$\times$	65.73
	$\times$	$\times$	√	√	65.16
	$\times$	$\times$	√	$\times$	63.22
	$\times$	$\times$	$\times$	√	61.01
5−10-shot	√	√	$\times$	$\times$	71.23
	$\times$	√	$\times$	$\times$	62.93
	√	$\times$	$\times$	$\times$	60.79

Model	1−2-shot		5−10-shot		Average
Model	5-way	10-way	5-way	10-way	Average
ProtoBERT	20.71±1.16	15.32±0.68	37.08±1.01	28.02±0.56	25.28±0.85
NNShot	21.58±0.70	15.72±0.53	25.66±0.78	19.82±1.11	20.70±0.78
StructShot	23.95±2.39	12.31±0.72	29.68±1.11	17.10±1.75	20.76±1.49
CONTAINER	40.00±0.71	35.89±0.16	54.28±0.64	48.48±0.69	44.66±0.55
ESD	37.12±0.89	31.26±0.53	50.50±1.79	33.38±4.71	38.07±1.98
DecomposedMetaNER	46.97±2.36	41.30±0.89	59.40±2.48	50.71±4.24	49.60±2.49
HEProto	53.03±0.30	46.45±0.21	65.70±0.21	58.98±0.22	56.04±0.23
MSDP	56.35±0.28	47.13±0.69	66.80±0.78	64.69±0.51	58.74±0.56
TadNER	60.78±0.32	55.44±0.08	67.94±0.17	60.87±0.22	61.26±0.19
NuNER	62.48±0.28	57.63±0.38	69.16±0.28	62.99±0.27	63.10±0.30
Ours	73.72±2.64	62.61±1.33	71.23±0.28	62.41±0.55	67.49±1.20

Model	1−2-shot		5−10-shot		Average
Model	5-way	10-way	5-way	10-way	Average
ProtoBERT	37.49±1.63	26.98±0.79	52.42±0.60	56.29±0.79	43.30±0.95
NNShot	40.31±2.30	31.54±1.63	42.66±1.07	37.09±0.13	37.90±1.28
StructShot	38.78±5.70	22.61±0.95	35.95±1.09	42.75±0.62	36.02±2.09
CONTAINER	52.17±2.74	50.00±1.46	61.11±0.76	59.84±0.36	55.78±1.33
ESD	57.56±2.52	52.89±1.11	69.92±0.56	66.90±0.60	61.82±1.19
DecomposedMetaNER	62.76±2.61	59.57±2.88	70.90±0.28	66.76±0.54	64.70±1.57
TadNER	64.83±0.14	64.06±0.19	72.12±0.12	69.94±0.15	67.73±0.15
HEProto	66.40±0.18	60.91±0.20	72.53±0.11	68.92±0.20	67.19±0.17
NuNER	67.37±0.31	66.54±0.40	73.50±0.09	71.04±0.14	69.61±0.23
MSDP	76.86±0.22	69.78±0.31	84.78±0.69	81.50±0.71	78.23±0.48
Ours	80.85±2.14	67.63±0.59	82.99±0.17	76.52±0.73	77.00±0.90