The paper discusses AttnLRP, a method that extends Layer-wise Relevance Propagation to handle attention layers in transformer models. It aims to provide better understanding of the reasoning process of these models, which are prone to biased predictions. Unlike other methods, AttnLRP can attribute not only input but also latent representations of transformer models, maintaining computational efficiency similar to a singular backward pass. The paper demonstrates that AttnLRP surpasses alternative methods in terms of faithfulness. An open-source implementation is available on GitHub.

 

Publication date: 8 Feb 2024
Project Page: https://github.com/rachtibat/LRP-for-Transformers
Paper: https://arxiv.org/pdf/2402.05602