The article discusses the importance and challenges of document-level Neural Machine Translation (DocNMT). It points out that while DocNMT is crucial for handling discourse phenomena, it suffers from computational inefficiency due to the complexity of the attention module. Many existing solutions focus on the encoder part or face performance drops. This work proposes an efficient method that maintains translation performance while speeding up the process by 20%. It introduces an extra selection layer based on lightweight attention that selects a small portion of tokens to be attended. The method can achieve up to 95% sparsity and save 93% computation cost on the attention module compared to the original Transformer, without compromising the performance.

 

Publication date: 26 Sep 2023
Project Page: Not provided
Paper: https://arxiv.org/pdf/2309.14174