PDF (1.2 MB)
Collect
Submit Manuscript
Show Outline
Figures (7)

Tables (8)
Table 7
Table 8
Table 1
Table 2
Table 3
Show 3 more tables Hide 3 tables
Open Access

Improving Few-Shot Named Entity Recognition with Causal Interventions

School of Computer, University of South China, Hengyang 421000, China, and also with School of Computer Science and Technology, Anhui University, Hefei 230601, China
School of Computer, University of South China, Hengyang 421000, China
School of Computer Science and Technology, Anhui University, Hefei 230601, China
Show Author Information

Abstract

Few-shot Named Entity Recognition (NER) systems are designed to identify new categories of entities with a limited number of labeled examples. A major challenge encountered by these systems is overfitting, particularly pronounced in comparison to tasks with ample samples. This overfitting predominantly stems from spurious correlations, a consequence of biases inherent in the selection of a small sample set. In response to this challenge, we introduce a novel approach in this paper: a causal intervention-based method for few-shot NER. Building upon the foundational structure of prototypical networks, our method strategically intervenes in the context to obstruct the indirect association between the context and the label. For scenarios restricted to 1-shot, where contextual intervention is not feasible, our method utilizes incremental learning to intervene at the prototype level. This not only counters overfitting but also aids in alleviating catastrophic forgetting. Additionally, to preliminarily classify entity types, we employ entity detection methods for coarse categorization. Considering the distinct characteristics of the source and target domains in few-shot tasks, we introduce sample reweighting to aid in model transfer and generalization. Through rigorous testing across multiple benchmark datasets, our approach consistently sets new state-of-the-art benchmarks, underscoring its efficacy in few-shot NER applications.

References

[1]
Z. Yang, Y. Liu, and C. Ouyang, Causal intervention-based few-shot named entity recognition, in Proc. EMNLP 2023, doi: 10.18653/v1/2023.findings-emnlp.1046.
[2]

J. P. C. Chiu and E. Nichols, Named entity recognition with bidirectional LSTM-CNNs, Trans. Assoc. Comput. Linguist., vol. 4, pp. 357–370, 2016.

[3]
X. Ma and E. Hovy. End-to-end sequence labeling via bi-directional LSTM-CNNs-CRF, arXiv preprint arXiv: 1603.01354, 2016.
[4]
G. Lample, M. Ballesteros, S. Subramanian, K. Kawakami, and C. Dyer, Neural architectures for named entity recognition, arXiv preprint arXiv: 1603.01360, 2016.
[5]
M. E. Peters, W. Ammar, C. Bhagavatula, and R. Power, Semi-supervised sequence tagging with bidirectional language models, arXiv preprint arXiv: 1705.00108, 2017.
[6]
J. Snell, K. Swersky, and R. S. Zemel, Prototypical networks for few-shot learning, arXiv preprint arXiv: 1703.05175, 2017.
[7]
A. Fritzler, V. Logacheva, and M. Kretov, Few-shot classification in named entity recognition task, in Proc. 34th ACM/SIGAPP Symp. on Applied Computing. Limassol, Cyprus, 2019, pp. 993–1000.
[8]
Y. Yang and A. Katiyar, Simple and effective few-shot named entity recognition with structured nearest neighbor learning, arXiv preprint arXiv: 2010.02405, 2020.
[9]
Y. Hou, W. Che, Y. Lai, Z. Zhou, Y. Liu, H. Liu, and T. Liu, Few-shot slot tagging with collapsed dependency transfer and label-enhanced task-adaptive projection network, arXiv preprint arXiv: 2006.05702, 2020.
[10]

Q. Lin, Y. Liu, W. Wen, Z. Tao, C. Ouyang, and Y. Wan, Ensemble making few-shot learning stronger, Data Intell., vol. 4, no. 3, pp. 529–551, 2022.

[11]
P. Wang, R. Xu, T. Liu, Q. Zhou, Y. Cao, B. Chang, and Z. Sui, An enhanced span-based decomposition method for few-shot sequence labeling, arXiv preprint arXiv: 2109.13023, 2021.
[12]
D. Yu, L. He, Y. Zhang, X. Du, P. Pasupat, and Q. Li, Few-shot intent classification and slot filling with retrieved examples, arXiv preprint arXiv: 2104.05763, 2021.
[13]
T. Ma, H. Jiang, Q. Wu, T. Zhao, and C.-Y. Lin, Decomposed meta-learning for few-shot named entity recognition, arXiv preprint arXiv: 2204.05751, 2022.
[14]
J. Pearl, Causality. Cambridge, UK: Cambridge University Press, 2009.
[15]
S. Thrun, Is learning the n-th thing anyeasier than learning the first? In Proc. of the 8th Int. Conf. on Neur. Information Processing Systems, Denver, CO, USA, 1995, pp. 640–646.
[16]
R. French, Catastrophic forgetting in connectionist networks, Trends Cogn. Sci., vol. 3, no. 4, pp. 128–135, 1999.
[17]
C. d. M. D’Autume, S. Ruder, L. Kong, and D. Yogatama, in Proc. of the 33rd Int. Conf. on Neural Information Processing Systems, arXiv preprint arXiv: 1906.01076, 2019.
[18]
M. Chen, W. Zhang, W. Zhang, Q. Chen, and H. Chen, Meta relational learning for few-shot link prediction in knowledge graphs, arXiv preprint arXiv: 1909.01515, 2019.
[19]
T. Gao, A. Fisch, and D. Chen, Making pre-trained language models better few-shot learners, arXiv preprint arXiv: 2012.15723, 2020.
[20]
T. B. Brown, B. Mann, N. Ryder, M. Subbiah, D. Amodei, J. Kaplan, P. Dhariwal, A. Neelakantan, P. Shyam, G. Sastry, A. Askell, et al., Language models are few-shot learners, Advances in neural information processing systems, doi: 10.48550/arXiv.2005.14165.
[21]
T. Schick and H. Schütze, It’s not just size that matters: Small language models are also few-shot learners, arXiv preprint arXiv: 2009.07118, 2020.
[22]

W. Fang, C. Ouyang, Q. Lin, and Y. Yuan, Three heads better than one: Pure entity, relation label and adversarial training for cross-domain few-shot relation extraction, Data Intell., vol. 5, no. 3, pp. 807–823, 2023.

[23]

M. G. Albayati, J. Faraj, A. Thompson, P. Patil, R. Gorthala, and S. Rajasekaran, Semi-supervised machine learning for fault detection and diagnosis of a rooftop unit, Big Data Mining and Analytics, vol. 6, no. 2, pp. 170–184, 2023.

[24]
X. Han, H. Zhu, P. Yu, Z. Wang, Y. Yao, Z. Liu, and M. Sun, FewRel: A large-scale supervised few-shot relation classification dataset with state-of-the-art evaluation, arXiv preprint arXiv: 1810.10147, 2018.
[25]
R. Geng, B. Li, Y. Li, X. Zhu, P. Jian, and J. Sun, Induction networks for few-shot text classification, arXiv preprint arXiv: 1902.10482, 2019.
[26]
P. Wang, R. Xun, T. Liu, D. Dai, B. Chang, and Z. Sui, Behind the scenes: An exploration of trigger biases problem in few-shot event classification, in Proc. 30th ACM Int. Conf. Information & Knowledge Management, Virtual Event, 2021, pp. 1969–1978.
[27]
J. Y. Lim, K. M. Lim, C. P. Lee, and Y. X. Tan, SSL-ProtoNet: Self-supervised learning prototypical networks for few-shot learning, Expert Syst. Appl., vol. 238, p. 122173, 2024.
[28]

M. Rezaei, D. Diepeveen, H. Laga, M. G. K. Jones, and F. Sohel, Plant disease recognition in a low data scenario using few-shot learning, Comput. Electron. Agric., vol. 219, p. 108812, 2024.

[29]
I. Sucholutsky and T. Griffiths. Alignment with human representations supports robust few shot learning, in Proc. of the 37th Int. Conf. on Neural Information Processing System, arXiv preprint arXiv:2301.11990, 2024.
[30]
Y. Tang, Z. Lin, Q. Wang, P. Zhu, and Q. Hu, AMU-tuning: Effective logit bias for clip-based few-shot learning, in Proc. of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, arXiv preprint arXiv: 2404.08958, 2024.
[31]
L. Zhu, T. Chen, D. Ji, J. Ye, and J. Liu. LLaFS: When large language modelsmeet few-shot segmentation, in Proc. of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, arXiv preprint arXiv: 2311.16926, 2024.
[32]
H. Ma, C. Zhang, Y. Bian, L., Z. Zhang, P. Zhao, S. Zhang, H. Fu, Q. Hu, and B. Wu, Fairness-guided few-shot prompting for large language models, in Proc. Advances in Neural Information Processing Systems 36 (NeurIPS 2023), arXiv preprint arXiv: 2303.13217, 2024.
[33]
M. Geng, S. Wang, D. Dong, H. Wang, G. Li, Z. Jin, X. Mao, and X. Liao, Large language models are few-shot summarizers: Multi-intent comment generation via In-context learning, in Proc. IEEE/ACM 46th Int. Conf. Software Engineering, Lisbon, Portugal, 2024, pp 1–13.
[34]
J. Ye, N. Xu, Y. Wang, J. Zhou, Q. Zhang, T. Gui, and X. Huang, LLM-DA: Data augmentation via large language models for few-shot named entity recognition, arXiv preprint arXiv: 2402.14568, 2024.
[35]
H. Liu, W. Zhang, J. Xie, B. Kim, Z. Zhang, and Y. Chai, Few-shot learning for chronic disease management: Leveraging large language models and multi-prompt engineering with medical knowledge injection, arXiv preprint arXiv: 2401.12988, 2024.
[36]
B. Kulis, Metric learning: A survey, Found. Trends® Mach. Learn., vol. 5, no. 4, pp. 287–364, 2013.
[37]
O. Vinyals, C. Blundell, T. Lillicrap, and D. Wierstra, Matching networks for one shot learning, in Proc. of the 30th Int. Conf. on Neural Information Processing Systems, arXiv preprint arXiv: 1606.04080, 2016.
[38]

C. Fan, Y. Zhang, H. Ma, Z. Ma, K. Yu, S. Zhao, and X. Zhang, A novel metric-based model with the ability of zero-shot learning for intelligent fault diagnosis, Eng. Appl. Artif. Intell., vol. 129, p. 107605, 2024.

[39]

J. Lu, S. Zhang, S. Zhao, D. Li, and R. Zhao, A metric-based few-shot learning method for fish species identification with limited samples, Animals, vol. 14, no. 5, p. 755, 2024.

[40]
S. Rao, J. Huang, and Z. Tang, RDProtoFusion: Refined discriminative prototype-based multi-task fusion for cross-domain few-shot learning, Neurocomputing, vol. 599, p. 128117, 2024.
[41]

B. Zhang, C. Luo, D. Yu, X. Li, H. Lin, Y. Ye, and B. Zhang, MetaDiff: Meta-learning with conditional diffusion for few-shot learning, Proc. AAAI Conf. Artif. Intell., vol. 38, no. 15, pp. 16687–16695, 2024.

[42]

Q. Liu, Y. Tian, T. Zhou, K. Lyu, R. Xin, Y. Shang, Y. Liu, J. Ren, and J. Li, A few-shot disease diagnosis decision making model based on meta-learning for general practice, Artif. Intell. Med., vol. 147, p. 102718, 2024.

[43]

Y. Gong, L. Mao, and C. Li, Few-shot learning for named entity recognition based on BERT and two-level model fusion, Data Intell., vol. 3, no. 4, pp. 568–577, 2021.

[44]
N. Ding, G. Xu, Y. Chen, X. Wang, X. Han, P. Xie, H.-T. Zheng, and Z. Liu, Few-NERD: A few-shot named entity recognition dataset, arXiv preprint arXiv: 2105.07464, 2021.
[45]
L. Cui, Y. Wu, J. Liu, S. Yang, and Y. Zhang, Template-based named entity recognition using BART, arXiv preprint arXiv: 2106.01760, 2021.
[46]
S. S. S. Das, A. Katiyar, R. J. Passonneau, and R. Zhang, CONTaiNER: Few-shot named entity recognition via contrastive learning, arXiv preprint arXiv: 2109.07589, 2021.
[47]
B. Athiwaratkun, C. N. dos Santos, J. Krone, and B. Xiang, Augmented natural language for generative sequence labeling, arXiv preprint arXiv: 2009.13272, 2020.
[48]
Y. Wang, H. Chu, C. Zhang, and J. Gao, Learning from language description: Low-shot named entity recognition via decomposed framework, arXiv preprint arXiv: 2109.05357, 2021.
[49]
G. Dong, Z. Wang, J. Zhao, G. Zhao, D. Guo, D. Fu, T. Hui, C. Zeng, K. He, X. Li, et al., A multi-task semantic decomposition framework with task-specific pre-training for few-shot NER, in Proc. 32nd ACM Int. Conf. Information and Knowledge Management, Birmingham, UK, 2023, pp 430–440.
[50]
Y. Li, Y. Yu, and T. Qian, Type-aware decomposed framework for few-shot named entity recognition, arXiv preprint arXiv: 2302.06397, 2023.
[51]
W. Chen, L. Zhao, P. Luo, T. Xu, Y. Zheng, and E. Chen, HEProto: A kierarchical enhancing ProtoNet based on multi-task learning for few-shot named entity recognition, in Proc. 32nd ACM Int. Conf. Information and Knowledge Management, Birmingham, UK, 2023, pp. 296–305.
[52]
S. Bogdanov, A. Constantin, T. Bernard, B. Crabbé, and E. Bernard, NuNER: Entity recognition encoder pre-training via LLM-annotated data, arXiv preprint arXiv: 2402.15343, 2024.
[53]
X. Yang, H. Zhang, G. Qi, and J. Cai, Causal attention for vision-language tasks, in Proc. IEEE/CVF Conf. Computer Vision and Pattern Recognition (CVPR), Nashville, TN, USA, 2021, pp. 9842–9852.
[54]
Y. Wu, K. Kuang, Y. Zhang, X. Liu, C. Sun, J. Xiao, Y. Zhuang, L. Si, and F. Wu, De-biased court’s view generation with causality, in Proc. 2020 Conf. Empirical Methods in Natural Language Processing (EMNLP), Virtual Event, 2020, pp. 763–780.
[55]
A. Coucke, A. Saade, A. Ball, T. Bluche, A. Caulier, D. Leroy, C. Doumouro, T. Gisselbrecht, F. Caltagirone, T. Lavril, et al., Snips voice platform: An embedded spoken language understanding system for private-by-design voice interfaces, arXiv preprint arXiv: 1805.10190, 2018.
[56]
J. Devlin, M.-W. Chang, K. Lee, and K. Toutanova, BERT: Pre-training of deep bidirectional transformers for language understanding, arXiv preprint arXiv: 1810.04805, 2018.
[57]
I. Loshchilov and F. Hutter, Decoupled weight decay regularization, arXiv preprint arXiv: 1711.05101, 2017.
[58]
J. Ma, Z. Yan, C. Li, and Y. Zhang, Frustratingly simple few-shot slot tagging, in Proc. Findings of the Association for Computational Linguistics : ACL-IJCNLP 2021, Virtual Event, 2021, pp. 1028–1033.
[59]
M. Henderson and I. Vulić, ConVEx: Data-efficient and few-shot slot labeling, arXiv preprint arXiv: 2010.11791, 2020.
Big Data Mining and Analytics
Pages 1375-1395
Cite this article:
Yang Z, Liu Y, Ouyang C, et al. Improving Few-Shot Named Entity Recognition with Causal Interventions. Big Data Mining and Analytics, 2024, 7(4): 1375-1395. https://doi.org/10.26599/BDMA.2024.9020052
Metrics & Citations  
Article History
Copyright
Rights and Permissions
Return