Investigating Bias Representations in Llama 2 Chat via Activation Steering
This paper addresses the issue of societal bias in Large Language Models (LLMs), specifically Llama 2 7B Chat model. It uses activation steering to probe and mitigate biases related to…
Continue reading