This academic article discusses a new approach to Document-level Neural Machine Translation (DocNMT). The authors highlight the importance of handling discourse phenomena through document-level context information, but note that the computation cost of these attention mechanisms is high. To address this, they introduce an efficient transformer model that reduces computation cost through techniques like sparsity patterns, memory or global tokens, approximation to softmax with kernel methods, or a combination of these. They managed to maintain the translation performance while gaining 20% speed up by introducing an extra selection layer based on lightweight attention. Experimental results show up to 95% sparsity and 93% computation cost savings, while maintaining performance.

 

Publication date: 26 Sep 2023
Project Page: Not provided
Paper: https://arxiv.org/pdf/2309.14174