This article presents EUREKA, a human-level reward design algorithm powered by Large Language Models (LLMs). It employs LLMs’ zero-shot generation and code-writing abilities to perform evolutionary optimization over reward code. This algorithm outperforms expertly designed rewards in a wide range of reinforcement learning environments, improving performance by 52% on average. EUREKA can also be used for reinforcement learning from human feedback, enhancing the quality and safety of generated rewards without model updating. The authors demonstrate the algorithm’s effectiveness by teaching a simulated Shadow Hand to perform rapid pen-spinning tricks.

 

Publication date: 19 Oct 2023
Project Page: https://eureka-research.github.io
Paper: https://arxiv.org/pdf/2310.12931