The paper presents ICE-GRT, a model developed to enhance the performance of Large Language Models (LLMs) in domain-specific tasks. LLMs like ChatGPT and LLaMA often lack depth and accuracy in specialized areas. The ICE-GRT model, based on Reinforcement Learning from Human Feedback (RLHF) and Proximal Policy Optimization (PPO), demonstrates remarkable ability in in-domain scenarios without compromising general task performance. It also shows improved analysis ability, especially in complex scenarios where smaller-sized LLMs fall short. The model has shown state-of-the-art performance in domain-specific tasks and across 12 general language tasks.
Publication date: 5 Jan 2024
Project Page: Not provided
Paper: https://arxiv.org/pdf/2401.02072