Reinforcement Learning from Human Feedback: Whose Culture, Whose Values, Whose Perspectives?

📅 2024-07-02

🏛️ arXiv.org

📈 Citations: 0

✨ Influential: 0

career value

212K/year

🤖 AI Summary

This study addresses implicit cultural biases and value monism in Reinforcement Learning from Human Feedback (RLHF) for large language models (LLMs). Methodologically, it pioneers the systematic application of social epistemology to deconstruct RLHF’s value-embedding mechanisms, proposing a cross-cultural, multi-perspective framework for integrating human feedback. It combines philosophical critique, normative modeling, and governance-oriented empirical research to develop an actionable governance roadmap. Key contributions include: (1) establishing a feedbacker diversity assurance mechanism; (2) advancing explicit, negotiable modeling of values in LLM alignment; and (3) developing a cross-cultural alignment evaluation framework. Collectively, these advances enhance LLMs’ human responsiveness and ethical inclusivity, offering both theoretical grounding and practical tools for value alignment in globally diverse AI contexts.

Technology Category

Application Category

📝 Abstract

We argue for the epistemic and ethical advantages of pluralism in Reinforcement Learning from Human Feedback (RLHF) in the context of Large Language Models (LLM). Drawing on social epistemology and pluralist philosophy of science, we suggest ways in which RHLF can be made more responsive to human needs and how we can address challenges along the way. The paper concludes with an agenda for change, i.e. concrete, actionable steps to improve LLM development.

Problem

Research questions and friction points this paper is trying to address.

Cultural Diversity

Ethical Norms

Human Feedback Reinforcement Learning

Innovation

Methods, ideas, or system contributions that make the work stand out.

Human Feedback Integration

Social Knowledge Incorporation

Multi-perspective Science in Reinforcement Learning

🔎 Similar Papers

A Survey of Reinforcement Learning from Human Feedback

2023-12-22arXiv.orgCitations: 263

REBEL: Reward Regularization-Based Approach for Robotic Reinforcement Learning from Human Feedback

2023-12-22Citations: 0

Reinforcement Learning and Machine ethics:a systematic review

2024-07-02arXiv.orgCitations: 2