What Do Llamas Really Think? Revealing Preference Biases in Language Model Representations
This study investigates if large language models (LLMs) exhibit sociodemographic biases, even when they refuse to respond to sensitive prompts. Researchers explored this by probing contextualized embeddings to see if…
Continue reading