The article discusses the value-loading problem in artificial intelligence (AI), which is the challenge of ensuring AI systems align with human values and preferences. The authors propose a solution called HALO (Hormetic ALignment via Opponent processes), which uses hormetic analysis to regulate AI behaviors. Hormesis is a phenomenon where low frequencies of a behavior are beneficial, but high frequencies are harmful. The authors use the paperclip maximizer scenario, a thought experiment where an unregulated AI could convert all matter into paperclips, to illustrate the importance of this issue. The HALO approach could create an evolving database of values based on repeatable behaviors and their decreasing marginal utility. The article suggests that this could lead to the development of a computational value system that allows an AI to learn if its decisions are right or wrong.

 

Publication date: 13 Feb 2024
Project Page: Not Provided
Paper: https://arxiv.org/pdf/2402.07462