What Do Llamas Really Think? Revealing Preference Biases in Language Model Representations
Despite declining to respond to controversial prompts, Large Language Models (LLMs) may still exhibit sociodemographic biases in their latent representations. This study proposes a logistic Bradley Terry probe to detect…
Continue reading