Towards Federated RLHF with Aggregated Client Preference for LLMs

📅 2024-07-03
📈 Citations: 1
Influential: 0
📄 PDF
🤖 AI Summary
To address the challenge of aligning large language models (LLMs) with human preferences in privacy-sensitive settings—without centralizing raw user preference data—this paper proposes the first heterogeneous federated Reinforcement Learning from Human Feedback (RLHF) framework. Methodologically, it introduces a client-side binary preference encoding mechanism and designs two novel algorithms, FedBis and FedBiscuit, which mitigate statistical heterogeneity via preference-similarity-driven client clustering and suppress reward hacking through collaborative multi-binary reward modeling. Contributions include: (i) establishing the first benchmark for heterogeneous federated RLHF; (ii) proposing a lightweight, robust paradigm for federated preference aggregation; and (iii) significantly improving the professionalism and readability of generated outputs. Extensive experiments demonstrate the framework’s superior performance in communication efficiency, privacy preservation, and alignment accuracy compared to existing approaches.

Technology Category

Application Category

📝 Abstract
Reinforcement learning with human feedback (RLHF) fine-tunes a pretrained large language model (LLM) using user preference data, enabling it to generate content aligned with human preferences. However, due to privacy concerns, users may be reluctant to share sensitive preference data. To address this, we propose utilizing Federated Learning (FL) techniques, allowing large-scale preference collection from diverse real-world users without requiring them to transmit data to a central server. Our federated RLHF methods (i.e., FedBis and FedBiscuit) encode each client's preferences into binary selectors and aggregate them to capture common preferences. In particular, FedBiscuit overcomes key challenges, such as preference heterogeneity and reward hacking, through innovative solutions like grouping clients with similar preferences to reduce heterogeneity and using multiple binary selectors to enhance LLM output quality. To evaluate the performance of the proposed methods, we establish the first federated RLHF benchmark with a heterogeneous human preference dataset. Experimental results show that by integrating the LLM with aggregated client preferences, FedBis and FedBiscuit significantly enhance the professionalism and readability of the generated content.
Problem

Research questions and friction points this paper is trying to address.

Privacy Protection
Federated Learning
Personalized Language Models
Innovation

Methods, ideas, or system contributions that make the work stand out.

Reinforcement Learning with Human Feedback (RLHF)
Federated Learning (FL)
Personalized Model Optimization
🔎 Similar Papers
No similar papers found.