Reinforcement Learning from Human Feedback

Back to Basics: Revisiting REINFORCE Style Optimization for Learning from Human Feedback in LLMs

root February 24, 2024 0

This article discusses the application of Reinforcement Learning from Human Feedback (RLHF) in large language models (LLMs). It critically examines the use of Proximal Policy Optimization (PPO), which though popular,…

Artificial Intelligence Computation and Language

PRompt Optimization in Multi-Step Tasks (PROMST): Integrating Human Feedback and Preference Alignment

root February 15, 2024 0

The article presents a new framework called PROMST for optimizing prompts in multi-step tasks for Large Language Models (LLMs). Unlike previous methods, which work well for single-step tasks, PROMST is…

Machine Learning

Dense Reward for Free in Reinforcement Learning from Human Feedback

root February 2, 2024 0

The study focuses on Reinforcement Learning from Human Feedback (RLHF) and how it can be optimized. Traditionally, RLHF involves generating completions from a language model in response to a query…

Artificial Intelligence Computation and Language

Reinforcement learning for question answering in programming domain using public community scoring as a human feedback

root January 22, 2024 0

The article presents a study on the use of Reinforcement Learning from Human Feedback (RLHF) to improve the performance of the GPT Neo 125M in the Community Question Answering (CQA)…

Artificial Intelligence Computation and Language

Universal Jailbreak Backdoors from Poisoned Human Feedback

root November 27, 2023 0

This research paper explores the potential for ‘jailbreak backdoors’ in large language models trained with Reinforcement Learning from Human Feedback (RLHF). It reveals that a malicious actor could potentially poison…

Artificial Intelligence Computation and Language

On the Exploitability of Reinforcement Learning with Human Feedback for Large Language Models

root November 19, 2023 0

This academic article discusses the security vulnerabilities in Reinforcement Learning with Human Feedback (RLHF) in Large Language Models (LLMs). RLHF plays a crucial role in aligning LLMs with human preferences….

Artificial Intelligence Computation and Language

The Impact of Preference Agreement in Reinforcement Learning from Human Feedback: A Case Study in Summarization

root November 13, 2023 0

The paper investigates the impact of preference agreement on the efficacy of Reinforcement Learning from Human Feedback (RLHF) in text summarization. The authors demonstrate that including a diverse range of…

Artificial Intelligence Machine Learning

Safe RLHF: Safe Reinforcement Learning from Human Feedback

root October 22, 2023 0

The researchers from Peking University have proposed a novel algorithm, Safe Reinforcement Learning from Human Feedback (Safe RLHF), aimed at enhancing the safety and performance of Large Language Models (LLMs)….

Machine Learning

ReMax: A Simple, Effective, and Efficient Method for Aligning Large Language Models

root October 17, 2023 0

The paper discusses the challenges of training large language models (LLMs) using Reinforcement Learning from Human Feedback (RLHF). The authors identify three important properties in RLHF tasks: fast simulation, deterministic…

Page 1 of 1

Press ESC to close

Reinforcement Learning from Human Feedback

Please allow ads on our site