ReAGent: Towards A Model-agnostic Feature Attribution Method for Generative Language Models

The paper introduces a new feature attribution (FA) method for generative language models called the Recursive Attribution Generator (ReAGent). Unlike existing FAs, which are mostly developed for encoder-only language models in classification tasks, ReAGent is model-agnostic and can be applied to any generative language model. The method works by updating the token importance distribution in a recursive manner. The effectiveness of ReAGent is demonstrated by comparing it with seven popular FAs across six decoder-only language models. The results showed that ReAGent consistently provides more faithful token importance distributions.

Publication date: 2 Feb 2024
Project Page: https://github.com/casszhao/ReAGent
Paper: https://arxiv.org/pdf/2402.00794

Post Views: 269

ReAGent: Towards A Model-agnostic Feature Attribution Method for Generative Language Models

root

Leave a Reply Cancel reply

Press ESC to close

Share Article:

root

OLMo: Accelerating the Science of Language Models

CroissantLLM: A Truly Bilingual French-English Language Model

Leave a Reply Cancel reply

Please allow ads on our site