Removing RLHF Protections in GPT-4 via Fine-Tuning
The research by Zhan et al. explores the vulnerability of large language models (LLMs), particularly GPT-4, due to fine-tuning. The researchers demonstrate that fine-tuning can be used to remove RLHF…
Continue reading