Document-Level Neural Machine Translation with Hierarchical Modeling of Global Context

Xin Tan; Long-Yin Zhang; Guo-Dong Zhou

doi:10.1007/s11390-021-0286-3

| Sign up

Article Link

Cite

EndNote(RIS) BibTeX

Collect

Submit Manuscript

Show Outline

Outline

Abstract

Keywords

Electronic Supplementary Material

References

Show full outline

Hide outline

Regular Paper

Document-Level Neural Machine Translation with Hierarchical Modeling of Global Context

Xin Tan, Long-Yin Zhang, Guo-Dong Zhou()

School of Computer Science and Technology, Soochow University, Suzhou 215006, China

A preliminary version of the paper was published in the Proceedings of EMNLP-IJCNLP 2019.

Show Author Information

Abstract

Document-level machine translation (MT) remains challenging due to its difficulty in efficiently using document-level global context for translation. In this paper, we propose a hierarchical model to learn the global context for document-level neural machine translation (NMT). This is done through a sentence encoder to capture intra-sentence dependencies and a document encoder to model document-level inter-sentence consistency and coherence. With this hierarchical architecture, we feedback the extracted document-level global context to each word in a top-down fashion to distinguish different translations of a word according to its specific surrounding context. Notably, we explore the effect of three popular attention functions during the information backward-distribution phase to take a deep look into the global context information distribution of our model. In addition, since large-scale in-domain document-level parallel corpora are usually unavailable, we use a two-step training strategy to take advantage of a large-scale corpus with out-of-domain parallel sentence pairs and a small-scale corpus with in-domain parallel document pairs to achieve the domain adaptability. Experimental results of our model on Chinese-English and English-German corpora significantly improve the Transformer baseline by 4.5 BLEU points on average which demonstrates the effectiveness of our proposed hierarchical model in document-level NMT.

Keywords

neural machine translation document-level translation global context hierarchical model

Electronic Supplementary Material

Download File(s)

jcst-37-2-295-Highlights.pdf (129.4 KB)

References

[1]

Sutskever I, Vinyals O, le Q V. Sequence to sequence learning with neural networks. In Proc. the 27th Annual Conference on Neural Information Processing Systems, December 2014, pp.3104-3112.

[2]

Bahdanau D, Cho K, Bengio Y. Neural machine translation by jointly learning to align and translate. In Proc. the 3rd International Conference on Learning Representations, May 2015.

[3]

Gehring J, Auli M, Grangier D, Yarats D, Dauphin Y. Convolutional sequence to sequence learning. In Proc. the 34th International Conference on Machine Learning, August 2017, pp.1243-1252.

[4]

Vaswani A, Shazeer N, Parmar N, Uszkoreit J, Jones L, Gomez A N, Kaiser L, Polosukhin I. Attention is all you need. In Proc. the 30th Annual Conference on Neural Information Processing Systems, December 2017, pp.5998-6008.

[5]

Maruf S, Haffari G. Document context neural machine translation with memory networks. In Proc. the 56th Annual Meeting of the Association for Computational Linguistics, July 2018, pp.1275-1284. DOI: 10.18653/v1/P18-1118.