Reinforcement learning for question answering in programming domain using public community scoring as a human feedback

The article presents a study on the use of Reinforcement Learning from Human Feedback (RLHF) to improve the performance of the GPT Neo 125M in the Community Question Answering (CQA) for programming. The study uses scores from Stack Overflow and employs two distinct reward model training strategies for fine-tuning with Proximal Policy Optimization (PPO). The researchers also introduce an auxiliary scoring mechanism, highlighting the need for domain-specific evaluation methods. The study contributes to the ongoing efforts in refining Large Language Models through focused human feedback.

Publication date: 22 Jan 2024
Project Page: Not Provided
Paper: https://arxiv.org/pdf/2401.10882

Post Views: 278

Answer Set Programming, constrained reinforcement learning, Evaluation metrics, Knowledge Base Question Answering, Reinforcement Learning from Human Feedback

Reinforcement learning for question answering in programming domain using public community scoring as a human feedback

root

Leave a Reply Cancel reply

Press ESC to close

Share Article:

root

Measuring User Interface Accessibility for Color Blind Users

The Cadaver in the Machine: The Social Practices of Measurement and Validation in Motion Capture Technology

Leave a Reply Cancel reply

Please allow ads on our site