The paper introduces a new feature attribution (FA) method for generative language models called the Recursive Attribution Generator (ReAGent). Unlike existing FAs, which are mostly developed for encoder-only language models in classification tasks, ReAGent is model-agnostic and can be applied to any generative language model. The method works by updating the token importance distribution in a recursive manner. The effectiveness of ReAGent is demonstrated by comparing it with seven popular FAs across six decoder-only language models. The results showed that ReAGent consistently provides more faithful token importance distributions.
Publication date: 2 Feb 2024
Project Page: https://github.com/casszhao/ReAGent
Paper: https://arxiv.org/pdf/2402.00794