This study investigates the representation of sentiment in Large Language Models (LLMs). It reveals that sentiment is represented linearly in a single direction within these models, playing a vital role in tasks ranging from toy tasks to real-world datasets such as the Stanford Sentiment Treebank. The researchers also highlight the roles of a small subset of attention heads and neurons. Additionally, the study uncovers a phenomenon termed the ‘summarization motif’, where sentiment is summarized at intermediate positions without inherent sentiment, such as punctuation and names. The study demonstrates that a significant portion of classification accuracy is lost when the sentiment direction is ablated.
Publication date: 23 Oct 2023
Project Page: https://arxiv.org/abs/2310.15154v1
Paper: https://arxiv.org/pdf/2310.15154