AI Chat Paper
Note: Please note that the following content is generated by AMiner AI. SciOpen does not take any responsibility related to this content.
{{lang === 'zh_CN' ? '文章概述' : 'Summary'}}
{{lang === 'en_US' ? '中' : 'Eng'}}
Chat more with AI
Article Link
Collect
Submit Manuscript
Show Outline
Outline
Show full outline
Hide outline
Outline
Show full outline
Hide outline
Regular Paper

Document-Level Neural Machine Translation with Hierarchical Modeling of Global Context

School of Computer Science and Technology, Soochow University, Suzhou 215006, China

A preliminary version of the paper was published in the Proceedings of EMNLP-IJCNLP 2019.

Show Author Information

Abstract

Document-level machine translation (MT) remains challenging due to its difficulty in efficiently using document-level global context for translation. In this paper, we propose a hierarchical model to learn the global context for document-level neural machine translation (NMT). This is done through a sentence encoder to capture intra-sentence dependencies and a document encoder to model document-level inter-sentence consistency and coherence. With this hierarchical architecture, we feedback the extracted document-level global context to each word in a top-down fashion to distinguish different translations of a word according to its specific surrounding context. Notably, we explore the effect of three popular attention functions during the information backward-distribution phase to take a deep look into the global context information distribution of our model. In addition, since large-scale in-domain document-level parallel corpora are usually unavailable, we use a two-step training strategy to take advantage of a large-scale corpus with out-of-domain parallel sentence pairs and a small-scale corpus with in-domain parallel document pairs to achieve the domain adaptability. Experimental results of our model on Chinese-English and English-German corpora significantly improve the Transformer baseline by 4.5 BLEU points on average which demonstrates the effectiveness of our proposed hierarchical model in document-level NMT.

Electronic Supplementary Material

Download File(s)
jcst-37-2-295-Highlights.pdf (129.4 KB)

References

[1]
Sutskever I, Vinyals O, le Q V. Sequence to sequence learning with neural networks. In Proc. the 27th Annual Conference on Neural Information Processing Systems, December 2014, pp.3104-3112.
[2]
Bahdanau D, Cho K, Bengio Y. Neural machine translation by jointly learning to align and translate. In Proc. the 3rd International Conference on Learning Representations, May 2015.
[3]
Gehring J, Auli M, Grangier D, Yarats D, Dauphin Y. Convolutional sequence to sequence learning. In Proc. the 34th International Conference on Machine Learning, August 2017, pp.1243-1252.
[4]
Vaswani A, Shazeer N, Parmar N, Uszkoreit J, Jones L, Gomez A N, Kaiser L, Polosukhin I. Attention is all you need. In Proc. the 30th Annual Conference on Neural Information Processing Systems, December 2017, pp.5998-6008.
[5]
Maruf S, Haffari G. Document context neural machine translation with memory networks. In Proc. the 56th Annual Meeting of the Association for Computational Linguistics, July 2018, pp.1275-1284. DOI: 10.18653/v1/P18-1118.
[6]
Wang L, Tu Z, Way A, Liu Q. Exploiting cross-sentence context for neural machine translation. In Proc. the 2017 Conference on Empirical Methods in Natural Language Processing, September 2017, pp.2826-2831. DOI: 10.18653/v1/D17-1301.
[7]
Zhang J, Luan H, Sun M, Zhai F, Xu J, Zhang M, Liu Y. Improving the transformer translation model with documentlevel context. In Proc. the 2018 Conference on Empirical Methods in Natural Language Processing, October 31-November 4, 2018, pp.533-542. DOI: 10.18653/v1/D18-1049.
[8]
Miculicich L, Ram D, Pappas N, Henderson J. Documentlevel neural machine translation with hierarchical attention networks. In Proc. the 2018 Conference on Empirical Methods in Natural Language Processing, October 31-November 4, 2018, pp.2947-2954. DOI: 10.18653/v1/D18-1325.
[9]
Sordoni A, Bengio Y, Vahabi H, Lioma C, Simonsen J G, Nie J. A hierarchical recurrent encoder-decoder for generative context-aware query suggestion. In Proc. the 24th ACM International on Conference on Information and Knowledge Management, October 2015, pp.553-562. DOI: 10.1145/2806416.2806493.
[10]
Vinyals O, Fortunato M, Jaitly N. Pointer networks. In Proc. the 28th Annual Conference on Neural Information Processing Systems, December 2015, pp.2692-2700.
[11]
Dozat T, Christopher D M. Deep biaffine attention for neural dependency parsing. arXiv: 1611.01734, 2017. https://arxiv.org/abs/1611.01734, October 2020.
[12]
Voita E, Serdyukov P, Sennrich R, Titov I. Context-aware neural machine translation learns anaphora resolution. In Proc. the 56th Annual Meeting of the Association for Computational Linguistics, July 2018, pp.1264-1274. DOI: 10.18653/v1/P18-1117.
[13]

Tu Z, Liu Y, Shi S, Zhang T. Learning to remember translation history with a continuous cache. Transactions of the Association for Computational Linguistics, 2018, 6: 407-420. DOI: 10.1162/tacl_a_00029.

[14]
Kuang S, Xiong D, Luo W, Zhou G. Modeling coherence for neural machine translation with dynamic and topic caches. In Proc. the 27th International Conference on Computational Linguistics, August 2018, pp.596-606.
[15]
Bawden R, Sennrich R, Birch A, Haddow B. Evaluating discourse phenomena in neural machine translation. In Proc. the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, June 2018, pp.1304-1313. DOI: 10.18653/v1/N18-1118.
[16]
Xiong H, He Z, Wu H, Wang H. Modeling coherence for discourse neural machine translation. In Proc. the 33rd AAAI Conference on Artificial Intelligence, January 27-February 1, 2019, pp.7338-7345. DOI: 10.1609/aaai.v33i01.33017338.
[17]
Shen S, Cheng Y, He Z, He W, Wu H, Sun M, Liu Y. Minimum risk training for neural machine translation. In Proc. the 54th Annual Meeting of the Association for Computational Linguistics, August 2016, pp.1683-1692. DOI: 10.18653/v1/P16-1159.
[18]
Tan X, Zhang L, Xiong D, Zhou G. Hierarchical modeling of global context for document-level neural machine translation. In Proc. the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing, November 2019, pp.1576-1585. DOI: 10.18653/v1/D19-1168.
[19]
Maruf S, Martins A, Haffari G. Selective attention for context-aware neural machine translation. In Proc. the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, June 2019, pp.3092-3102. DOI: 10.18653/v1/N19-1313.
[20]
Yang Z, Zhang J, Meng F, Gu S, Feng Y, Zhou J. Enhancing context modeling with a query-guided capsule network for document-level translation. In Proc. the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing, November 2019, pp.1527-1537. DOI: 10.18653/v1/D19-1164.
[21]
Cettolo M, Girardi C, Federico M. WIT3: Web inventory of transcribed and translated talks. In Proc. the 16th Conference of the European Association for Machine Translation, May 2012, pp.261-268.
[22]
Koehn P. Europarl: A parallel corpus for statistical machine translation. In Proc. the 10th Machine Translation Summit, September 2005, pp.79-86.
[23]
Maruf S, Haffari G. Document context neural machine translation with memory networks. In Proc. the 56th Annual Meeting of the Association for Computational Linguistics, July 2018, pp.1275-1284. DOI: 10.18653/v1/P18-1118.
[24]
Koehn P, Hoang H, Birch A, Callision-Burch C, Federico M, Bertoldi N, Cowan B, Shen W, Moran C, Zens R, Dyer C, Bojar O, Constantin A, Herbst E. Moses: Open source toolkit for statistical machine translation. In Proc. the 45th Annual Meeting of the Association for Computational Linguistics, June 2007, pp.177-180.
[25]
Seenrich R, Haddow B, Birch A. Neural machine translation of rare words with subword units. In Proc. the 54th Annual Meeting of the Association for Computational Linguistics, August 2016, pp.1715-1725. DOI: 10.18653/v1/P16-1162.
[26]
Klein G, Kim Y, Deng Y, Senellart J, Rush A. OpenNMT: Open-source toolkit for neural machine translation. In Proc. the 55th Annual Meeting of the Association for Computational Linguistics, July 30-August 4, 2017, pp.67-72. DOI: 10.18653/v1/P17-4012.
[27]
Papineni K, Roukos S, Ward T, Zhu W. BLEU: A method for automatic evaluation of machine translation. In Proc. the 40th Annual Meeting of the Association for Computational Linguistics, July 2002, pp.311-318. DOI: 10.3115/1073083.1073135.
[28]
Lavie A, Agarwal A. METEOR: An automatic metric for MT evaluation with high levels of correlation with human judgments. In Proc. the 2nd Workshop on Statistical Machine Translation, June 2007, pp.228-231.
[29]
Werlen L M, Popescu-Belis A. Validation of an automatic metric for the accuracy of pronoun translation (APT). In Proc. the 3rd Workshop on Discourse in Machine Translation, September 2017, pp.17-25. DOI: 10.18653/v1/W17-4802.
[30]

Su J, Zeng J, Xiong D, Liu Y, Wang M, Xie J. A hierarchy-to-sequence attentional neural machine translation model. IEEE/ACM Trans. Audio, Speech, and Language Processing, 2018, 26(3): 263-632. DOI: 10.1109/TASLP.2018.2789721.

[31]
Chen J, Li X, Zhang J, Zhou C, Cui J, Wang B, Su J. Modeling discourse structure for document-level neural machine translation. arXiv: 2006.04721, 2020. https://arxiv.org/abs/2006.04721, June 2020.
Journal of Computer Science and Technology
Pages 295-308
Cite this article:
Tan X, Zhang L-Y, Zhou G-D. Document-Level Neural Machine Translation with Hierarchical Modeling of Global Context. Journal of Computer Science and Technology, 2022, 37(2): 295-308. https://doi.org/10.1007/s11390-021-0286-3

405

Views

3

Crossref

2

Web of Science

4

Scopus

0

CSCD

Altmetrics

Received: 09 March 2020
Accepted: 11 January 2021
Published: 31 March 2022
©Institute of Computing Technology, Chinese Academy of Sciences 2022
Return