AI Alignment and Social Choice: Fundamental Limitations and Policy Implications

This paper by Abhilash Mishra investigates the challenges of aligning AI systems with human intentions and values using Reinforcement Learning with Human Feedback (RLHF). It discusses the issue of determining whose values should be reflected in AI systems and the limitations of RLHF. The paper argues that it is impossible to universally align AI systems with everyone’s values without violating some private ethical preferences. It discusses the implications for AI governance, suggesting the need for transparent voting rules and for AI systems to be narrowly aligned to specific user groups.

Publication date: 26 Oct 2023
Project Page: https://arxiv.org/abs/2310.16048v1
Paper: https://arxiv.org/pdf/2310.16048

Post Views: 273

root

Leave a Reply Cancel reply

Press ESC to close

Share Article:

root

Inclusion in Virtual Reality Technology: A Scoping Review

BianQue: Balancing the Questioning and Suggestion Ability of Health LLMs with Multi-turn Health Conversations Polished by ChatGPT

Leave a Reply Cancel reply

Please allow ads on our site