Safe RLHF: Safe Reinforcement Learning from Human Feedback
The researchers from Peking University have proposed a novel algorithm, Safe Reinforcement Learning from Human Feedback (Safe RLHF), aimed at enhancing the safety and performance of Large Language Models (LLMs)….
Continue reading