Document Type : Original/Review Paper

Authors

Human-Computer Interaction Lab., Faculty of Electrical and Computer Engineering Tarbiat Modares University, Tehran, Iran.

Abstract

Most of the existing neural machine translation (NMT) methods translate sentences without considering the context. It is shown that exploiting inter and intra-sentential context can improve the NMT models and yield to better overall translation quality. However, providing document-level data is costly, so properly exploiting contextual data from monolingual corpora would help translation quality. In this paper, we proposed a new method for context-aware neural machine translation (CA-NMT) using a combination of hierarchical attention networks (HAN) and automatic post-editing (APE) techniques to fix discourse phenomena when there is lack of context. HAN is used when we have a few document-level data, and APE can be trained on vast monolingual document-level data to improve results further. Experimental results show that combining HAN and APE can complement each other to mitigate contextual translation errors and further improve CA-NMT by achieving reasonable improvement over HAN (i.e., BLEU score of 22.91 on En-De news-commentary dataset).

Keywords

[1] A. Vaswani, N. Shazeer, N. Parmar, J. Uszkoreit, L. Jones, A. N. Gomez, Ł. u. Kaiser, and I. Polosukhin, "Attention is All you Need," in Advances in Neural Information Processing Systems 30, Curran Associates, Inc., 2017, pp. 5998-6008.
 
[2] J. Devlin, M.-W. Chang, K. Lee and K. Toutanova, "BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding," in Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers), Minneapolis, Minnesota, 2019.
 
[3] A. Radford and K. Narasimhan, "Improving Language Understanding by Generative Pre-training," 2018.
 
[4] M. Lewis, Y. Liu, N. Goyal, M. Ghazvininejad, A. Mohamed, O. Levy, V. Stoyanov, and L. Zettlemoyer, "BART: Denoising Sequence-to-Sequence Pre-training for Natural Language Generation, Translation, and Comprehension," in Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, Online, 2020.
 
[5] L. Xue, N. Constant, A. Roberts, M. Kale, R. Al-Rfou, A. Siddhant, A. Barua, and C. Raffel, "mT5: A Massively Multilingual Pre-trained Text-to-Text Transformer," Online, 2021.
 
[6] Y. Liu, J. Gu, N. Goyal, X. Li, S. Edunov, M. Ghazvininejad, M. Lewis, and L. Zettlemoyer, "Multilingual Denoising Pre-training for Neural Machine Translation," Transactions of the Association for Computational Linguistics, Vol. 8, pp. 726-742, 2020.
 
[7] J. Tiedemann and Y. Scherrer, "Neural Machine Translation with Extended Context," in Proceedings of the Third Workshop on Discourse in Machine Translation, Copenhagen, Denmark, 2017.
 
[8] L. Wang, Z. Tu, A. Way, and Q. Liu, "Exploiting Cross-Sentence Context for Neural Machine Translation," in Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, Stroudsburg, PA, USA, 2017.
 
[9] L. Miculicich, D. Ram, N. Pappas and J. Henderson, "Document-Level Neural Machine Translation with Hierarchical Attention Networks," in Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, Brussels, Belgium, 2018.
 
[10] M. Freitag, I. Caswell and S. Roy, "APE at Scale and Its Implications on MT Evaluation Biases," in Proceedings of the Fourth Conference on Machine Translation (Volume 1: Research Papers), Florence, Italy, 2019.
 
[11] E. Voita, R. Sennrich, and I. Titov, "Context-Aware Monolingual Repair for Neural Machine Translation," in Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing, Hong Kong, China, 2019.
 
[12] K. Cho, B. van Merriënboer, C. Gulcehre, D. Bahdanau, F. Bougares, H. Schwenk, and Y. Bengio, "Learning Phrase Representations using RNN Encoder–Decoder for Statistical Machine Translation," in Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP), Doha, Qatar, 2014.
 
[13] I. Sutskever, O. Vinyals, and Q. V. Le, "Sequence to Sequence Learning with Neural Networks," in Advances in Neural Information Processing Systems 27, Curran Associates, Inc., 2014, pp. 3104-3112.
 
[14] D. Bahdanau, K. Cho and Y. Bengio, "Neural machine translation by jointly learning to align and translate," in Proceedings of the International Conference on Learning Representations., 2015.
 
[15] J. Gehring, M. Auli, D. Grangier, D. Yarats, and Y. N. Dauphin, "Convolutional Sequence to Sequence Learning," 2017.
 
[16] Z. Tu, Y. Liu, S. Shi, and T. Zhang, "Learning to Remember Translation History with a Continuous Cache," Transactions of the Association for Computational Linguistics, Vol. 6, pp. 407-420, 2018.
 
[17] R. Chatterjee, M. Weller, M. Negri, and M. Turchi, "Exploring the Planet of the APEs: a Comparative Study of State-of-the-art Methods for MT Automatic Post-Editing," in Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing (Volume 2: Short Papers), Beijing, China, 2015.
 
[18] M. Negri, M. Turchi, R. Chatterjee, and N. Bertoldi, "ESCAPE: a Large-scale Synthetic Corpus for Automatic Post-Editing," in Proceedings of the Eleventh International Conference on Language Resources and Evaluation (LREC 2018), Miyazaki, Japan, 2018.
 
[19] S. Maruf, A. F. T. Martins, and G. Haffari, "Selective Attention for Context-aware Neural Machine Translation," in Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers), Minneapolis, Minnesota, 2019.
[20] R. Sennrich, B. Haddow, and A. Birch, "Improving Neural Machine Translation Models with Monolingual Data," in Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), Berlin, Germany, 2016.
 
[21] M. Post, "A Call for Clarity in Reporting BLEU Scores," in Proceedings of the Third Conference on Machine Translation: Research Papers, Brussels, Belgium, 2018.
 
[22] S. Banerjee and A. Lavie, "METEOR: An Automatic Metric for MT Evaluation with Improved Correlation with Human Judgments," in Proceedings of the ACL Workshop on Intrinsic and Extrinsic Evaluation Measures for Machine Translation and/or Summarization, Ann Arbor, Michigan, 2005.
 
[23] M. Snover, B. Dorr, R. Schwartz, L. Micciulla, and R. Weischedel, "A Study of Translation Error Rate with Targeted Human Annotation," in In Proceedings of the Association for Machine Transaltion in the Americas (AMTA 2006), 2006.
 
[24] E. Voita, R. Sennrich, and I. Titov, "When a Good Translation is Wrong in Context: Context-Aware Machine Translation Improves on Deixis, Ellipsis, and Lexical Cohesion," in Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics, Florence, Italy, 2019.
 
[25] E. Voita, P. Serdyukov, R. Sennrich, and I. Titov, "Context-Aware Neural Machine Translation Learns Anaphora Resolution," in Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), Melbourne, Australia, 2018.
 
[26] S. Maruf and G. Haffari, "Document Context Neural Machine Translation with Memory Networks," in Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), Melbourne, Australia, 2018.
 
[27] M.-T. Luong, H. Pham, and C. D. Manning, "Effective Approaches to Attention-based Neural Machine Translation," 2015.
 
[28] M. Kavehzadeh, M. M. Abdollah-Pour, S. Momtazi, "A Transformer-based Approach for Persian Text Chunking." Journal of AI and Data Mining, 10(3), 2022, 373-383.