🤖 AI Summary
This work addresses the challenge of aligning large language models with decentralized, highly non-IID human preference data under strict privacy constraints in federated learning. To this end, we propose FedPDPO, a novel framework that integrates parameter-efficient fine-tuning with personalization mechanisms. FedPDPO employs globally shared LoRA adapters for collaborative training while equipping each client with a dedicated language model head, an explicit reward head, and a feature-balanced bottleneck adapter to mitigate performance degradation caused by data heterogeneity. Extensive experiments demonstrate that FedPDPO achieves state-of-the-art performance across multiple preference datasets, yielding up to a 4.80% average accuracy improvement over existing methods in both intra-domain and cross-domain federated settings.
📝 Abstract
Aligning large language models (LLMs) with human preferences in federated learning (FL) is challenging due to decentralized, privacy-sensitive, and highly non-IID preference data. Direct Preference Optimization (DPO) offers an efficient alternative to reinforcement learning with human feedback (RLHF), but its direct application in FL suffers from severe performance degradation under non-IID data and limited generalization of implicit rewards. To bridge this gap, we propose FedPDPO (Federated Personalized Direct Preference Optimization), a personalized federated framework for preference alignment of LLMs. It adopts a parameter-efficient fine-tuning architecture where each client maintains a frozen pretrained LLM backbone augmented with a Low-Rank Adaptation (LoRA) adapter, enabling communication-efficient aggregation. To address non-IID heterogeneity, we devise (1) the globally shared LoRA adapter with the personalized client-specific LLM head. Moreover, we introduce (2) a personalized DPO training strategy with a client-specific explicit reward head to complement implicit rewards and further alleviate non-IID heterogeneity, and (3) a bottleneck adapter to balance global and local features. We provide theoretical analysis establishing the probabilistic foundation and soundness. Extensive experiments on multiple preference datasets demonstrate state-of-the-art performance, achieving up to 4.80% average accuracy improvements in federated intra-domain and cross-domain settings.