Multi-Level Cross-Lingual Attentive Neural Architecture for Low Resource Name Tagging

Xiaocheng Feng; Lifu Huang; Bing Qin; Ying Lin; Heng Ji; Ting Liu

doi:10.23919/TST.2017.8195346

AI Chat Paper

Note: Please note that the following content is generated by AMiner AI. SciOpen does not take any responsibility related to this content.

Chat more with AI

| Sign up

Browse by Subject

Search for peer-reviewed journals with full access.

Journals A - Z

About Us

Discover the SciOpen Platform and Achieve Your Research Goals with Ease.

About Us

Publish with Us

Support

Journals A - Z

About Us

Publish with Us

Support

PDF (2.2 MB)

Cite

EndNote(RIS) BibTeX

Collect

Submit Manuscript

AI Chat Paper

Show Outline

Outline

Show full outline

Hide outline

Outline

Show full outline

Hide outline

Open Access

Multi-Level Cross-Lingual Attentive Neural Architecture for Low Resource Name Tagging

Xiaocheng Feng, Lifu Huang, Bing Qin(

), Ying Lin, Heng Ji, Ting Liu

College of Computer Science and Technology, Harbin Institute of Technology, Harbin 150001, China.

College of Computer Science, Rensselaer Polytechnic Institute, Troy 12180, USA.

Show Author Information

Abstract

Neural networks have been widely used for English name tagging and have delivered state-of-the-art results. However, for low resource languages, due to the limited resources and lack of training data, taggers tend to have lower performance, in comparison to the English language. In this paper, we tackle this challenging issue by incorporating multi-level cross-lingual knowledge as attention into a neural architecture, which guides low resource name tagging to achieve a better performance. Specifically, we regard entity type distribution as language independent and use bilingual lexicons to bridge cross-lingual semantic mapping. Then, we jointly apply word-level cross-lingual mutual influence and entity-type level monolingual word distributions to enhance low resource name tagging. Experiments on three languages demonstrate the effectiveness of this neural architecture: for Chinese, Uzbek, and Turkish, we are able to yield significant improvements in name tagging over all previous baselines.

Keywords

deep learning recurrent neural network name tagging cross-lingual information extraction

References

[1]

Domingos P., A few useful things to know about machine learning, Commun. ACM, vol. 55, no. 10, pp. 78-87, 2012.

Crossref Google Scholar

[2]

Isozaki H. and Kazawa H., Efficient support vector classifiers for named entity recognition, in Proc. 19th Int. Conference on Computational Linguistics-Volume 1, Taipei, China, 2002, pp. 1-7.

Crossref

[3]

Kazama J., Makino T., Ohta Y., and Tsujii J., Tuning support vector machines for biomedical named entity recognition, in Proc. ACL-02 Workshop on Natural Language Processing in the Biomedical Domain-Volume 3, Phildadelphia, PA, USA, 2002, pp. 1-8.

Crossref

[4]

Settles B., Biomedical named entity recognition using conditional random fields and rich feature sets, in Proc. International Joint Workshop on Natural Language Processing in Biomedicine and its Applications, Geneva, Switzerland, 2004, pp. 104-107.

Crossref

[5]

Le Q. and Mikolov T., Distributed representations of sentences and documents, in Proc. 31st International Conference on Machine Learning, Beijing, China, 2014, pp. 1188-1196.

[6]

Lample G., Ballesteros M., Subramanian S., Kawakami K., and Dyer C., Neural architectures for named entity recognition, arXiv preprint arXiv:1603.01360, 2016.

Crossref

[7]

dos Santos C. N. and Guimarães V., Boosting named entity recognition with neural character embeddings, arXiv preprint arXiv:1505.05008, 2015.

Crossref

[8]

Zeng D. J., Liu K., Lai S. W., Zhou G. Y., and Zhao J., Relation classification via convolutional deep neural network, in Proc. COLING 2014, the 25th International Conference on Computational Linguistics: Technical Papers, Dublin, Ireland, 2014, pp. 2335-2344.

[9]

Xu Y., Mou L. L., Li G., Chen Y. C., Peng H., and Jin Z., Classifying relations via long short term memory networks along shortest dependency paths, in Proc. Conference on Empirical Methods in Natural Language Processing, Lisbon, Portugal, 2015, pp. 1785-1794.

Crossref

[10]

Feng X. C., Huang L. F., Tang D. Y., Qin B., Ji H., and Liu T., A language-independent neural network for event detection, in Proc. 54th Annual Meeting of the Association for Computational Linguistics, Berlin, Germany, 2016, pp. 66-71.

Crossref

[11]

Huang L. F., Cassidy T., Feng X. C., Ji H., Voss C. R., Han J. W., and Sil A., Liberal event extraction and event schema induction, in Proc. the 54th Annual Meeting of the Association for Computational Linguistics, Berlin, Germany, 2016, pp. 258-268.

Crossref

[12]

Zhang C., Zhou M., Han X., Hu Z., and Ji Y., Knowledge graph embedding for hyper-relational data, Tsinghua Sci. Technol., vol, 22, no. 2. pp. 185-197, 2017.

Crossref Google Scholar

[13]

Li Q., Li H. B., Ji H., Wang W., Zheng J., and Huang F., Joint bilingual name tagging for parallel corpora, in Proc. 21st ACM Int. Conference on Information and Knowledge Management, Maui, HI, USA, 2012, pp. 1727-1731.

Crossref

[14]

Che W. X., Wang M. Q., Manning C. D., and Liu T., Named entity recognition with bilingual constraints, in HLT-NAACL, Atlanta, GA, USA, 2013, pp. 52-62.

[15]

Wang M. Q., Che W. X., and Manning C. D., Effective bilingual constraints for semi-supervised learning of named entity recognizers, in Proc. 27th AAAI Conference on Artificial Intelligence, Bellevue, WA, USA, 2013, pp. 919-925.

[16]

Schuster M. and Paliwal K. K., Bidirectional recurrent neural networks, IEEE Trans. Sig. Proc., vol. 45, no. 11, pp. 2673-2681, 1997.

Crossref Google Scholar

[17]

Bahdanau D., Cho K., and Bengio Y., Neural machine translation by jointly learning to align and translate, arXiv preprint arXiv:1409.0473, 2015.

[18]

Lin Y. K., Shen S. Q., Liu Z. Y., Luan H. B., and Sun M. S., Neural relation extraction with selective attention over instances, in Proc. 54th Annual Meeting of the Association for Computational Linguistics, Berlin, Germany, 2016, pp. 2124-2133.

Crossref

[19]

Tang D. Y., Qin B., and Liu T., Aspect level sentiment classification with deep memory network, arXiv preprint arXiv:1605.08900, 2016.

Crossref

[20]

Song Y. Q., Upadhyay S., Peng H. R., and Roth D., Cross-lingual dataless classification for many languages, in Proc. 25th Int. Joint Conference on Artificial Intelligence, New York, NY, USA, 2016, pp. 2901-2907.

[21]

Guo J., Che W. X., Wang H. F., and Liu T., A universal framework for inductive transfer parsing across multi-typed treebanks, in Proc. COLING 2016, the 26th International Conference on Computational Linguistics: Technical Papers, Osaka, Japan, 2016, pp. 12-22.

[22]

Baroni M., Dinu G., and Kruszewski G., Don’t count, predict! A systematic comparison of context-counting vs. context-predicting semantic vectors, in Proc. 52nd Annual Meeting of the Association for Computational Linguistics, Baltimore, MD, USA, 2014, pp. 238-247.

Crossref

[23]

Mikolov T., Sutskever I., Chen K., Corrado G. S., and Dean J., Distributed representations of words and phrases and their compositionality, in Advances in Neural Information Processing Systems 26, Lake Tahoe, NV, USA, 2013, pp. 3111-3119.

[24]

Hochreiter S. and Schmidhuber J., LSTM can solve hard long time lag problems, in Proc. 9th Int. Conference on Neural Information Processing Systems, Denver, CO, USA, 1996, pp. 473-479.

[25]

Liwicki M., Graves A., Bunke H., and Schmidhuber J., A novel approach to on-line handwriting recognition based on bidirectional long short-term memory networks, in Proc. 9th Int. Conf. on Document Analysis and Recognition, Curitiba, Brazil, 2007, pp. 367-371.

[26]

Ling W., Luís T., Marujo L., Astudillo R. F., Amir S., Dyer C., Black A. W., and Trancoso I., Finding function in form: Compositional character models for open vocabulary word representation, arXiv preprint arXiv: 1508.02096, 2015.

Crossref

[27]

Lafferty J., McCallum A., and Pereira F. C. N., Conditional random fields: Probabilistic models for segmenting and labeling sequence data, in Proc. 18th Int. Conference on Machine Learning, San Francisco, CA, USA, pp. 282-289.

[28]

Ballesteros M., Dyer C., and Smith N. A., Improved transition-based parsing by modeling characters instead of words with LSTMs, arXiv preprint arXiv: 1508.00657, 2015.

Crossref

[29]

Zhang X., Zhao J. B., and LeCun Y., Character-level convolutional networks for text classification, in Advances in Neural Information Processing Systems, Montreal, Canada, 2015, pp. 649-657.

[30]

Kim Y., Jernite Y., Sontag D., and Rush A. M., Character-aware neural language models, arXiv preprint arXiv: 1508.06615, 2015.

[31]

Bahdanau D., Cho K., and Bengio Y., Neural machine translation by jointly learning to align and translate, arXiv preprint arXiv: 1409.0473, 2014.

[32]

Manning C. D. and Schütze H., Foundations of Statistical Natural Language Processing. Cambridge, MA, USA: MIT Press, 1999.

[33]

Jurafsky D. and Martin J. H., Speech and Language Processing: An Introduction to Natural Language Processing, Computational Linguistics, and Speech Recognition. Upper Saddle River, NJ, USA: Prentice Hall, 2000.

[34]

Zhang B. L., Pan X. M., Wang T. L., Vaswani A., Ji H., Knight K., and Marcu D., Name tagging for low-resource incident languages based on expectation-driven learning, in Proc. NAACL-HLT, San Diego, CA, USA, 2016, pp. 249-259.

Crossref

[35]

Finkel J. R., Grenager T., and Manning C., Incorporating non-local informating into information extraction systems by Gibbs sampling, in Proceedings of the 43rd Annual Meeting on Association for Computationa Linguistics, 2005.

Crossref

[36]

Roth D. and Yih W., Integer linear programming inference for conditional random fields, in ICML’05 Proceedings of the 22nd International Conference on Machine Learning, Bonnn, Germany, 2005.

Crossref

[37]

Huang F. and Vogel S., Improved named entity translation and bilingual named entity extraction, in Proc. 4th IEEE Int. Conference on Multimodal Interfaces, Pittsburgh, PA, USA, 2002, pp. 253-258.

[38]

Chen Y. F., Zong C. Q., and Su K.-Y., On jointly recognizing and aligning bilingual named entities, in Proc. 48th Annual Meeting of the Association for Computational Linguistics, Uppsala, Sweden, 2010, pp. 631-639.

[39]

Burkett D., Petrov S., Blitzer J., and Klein D., Learning better monolingual models with unannotated bilingual text, in Proc. 14th Conference on Computational Natural Language Learning, Uppsala, Sweden, 2010, pp. 46-54.

[40]

Kim S., Toutanova K., and Yu H., Multilingual named entity recognition using parallel data and metadata from Wikipedia, in Proc. 50th Annual Meeting of the Association for Computational Linguistics: Long Papers-Volume 1, Jeju Island, Korea, 2012, pp. 694-702.

[41]

Pennington J., Socher R., and Manning C. D., GloVe: Global vectors for word representation, in Proc. 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP), Doha, Qatar, 2014, pp. 1532-1543.

Crossref

[42]

Kalchbrenner N., Grefenstette E., and Blunsom P., A convolutional neural network for modelling sentences, arXiv preprint arXiv: 1404.2188, 2014.

Crossref

[43]

Kim Y., Convolutional neural networks for sentence classification, arXiv preprint arXiv: 1408.5882, 2014.

Crossref

[44]

Gillick D., Brunk C., Vinyals O., and Subramanya A., Multilingual language processing from bytes, arXiv preprint arXiv: 1512.00103, 2015.

Crossref

[45]

Chiu J. P. C. and Nichols E., Named entity recognition with bidirectional LSTM-CNNs, arXiv preprint arXiv: 1511.08308, 2015.

Crossref

Tsinghua Science and Technology

Volume 22 Issue 6,
December 2017

Pages 633-645

DOI: 10.23919/TST.2017.8195346

Cite this article:

Feng X, Huang L, Qin B, et al. Multi-Level Cross-Lingual Attentive Neural Architecture for Low Resource Name Tagging. Tsinghua Science and Technology, 2017, 22(6): 633-645. https://doi.org/10.23919/TST.2017.8195346

509

Views

Downloads

Crossref

N/A

Web of Science

Scopus

CSCD

Google Scholar
Citation

Altmetrics

Received: 31 December 2016

Revised: 26 April 2017

Accepted: 14 June 2017

Published: 14 December 2017