reward function

Computation and Language Machine Learning

Theoretical guarantees on the best-of-n alignment policy

root January 4, 2024 0

The paper discusses the best-of-n policy used for aligning generative models. It disproves a common claim that the KL divergence between the best-of-n policy and the base policy is equal…

Machine Learning

Reward Function Design for Crowd Simulation via Reinforcement Learning

root September 25, 2023 0

The article presents a study on reward function design for crowd simulation via reinforcement learning. The authors argue that the design of the reward function is crucial for successful simulation…

Page 1 of 1

Press ESC to close

reward function

Please allow ads on our site