Group Preference Alignment: Customized LLM Response Generation from In-Situ Conversations

📅 2025-03-11
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Standard training paradigms for large language models (LLMs) yield suboptimal group-level adaptation, failing to capture heterogeneous conversational preferences across diverse user populations. Method: We propose Group-aware Preference Alignment (GPA), a novel framework that identifies and responds to group-specific dialogue preferences. GPA introduces (i) group-aware preference extraction and interpretable scoring-criterion distillation, and (ii) a dual-path personalized generation paradigm integrating context-tuned inference and criterion-driven fine-tuning—leveraging dialogue log mining, preference-divergence maximization modeling, dynamic prompt control, and group-specific contrastive synthetic data fine-tuning. Results: GPA significantly improves alignment between model responses and multi-group preferences. Human evaluations demonstrate consistent superiority over all baselines, while GPA maintains strong generalization performance on standard benchmarks.

Technology Category

Application Category

📝 Abstract
LLMs often fail to meet the specialized needs of distinct user groups due to their one-size-fits-all training paradigm cite{lucy-etal-2024-one} and there is limited research on what personalization aspects each group expect. To address these limitations, we propose a group-aware personalization framework, Group Preference Alignment (GPA), that identifies context-specific variations in conversational preferences across user groups and then steers LLMs to address those preferences. Our approach consists of two steps: (1) Group-Aware Preference Extraction, where maximally divergent user-group preferences are extracted from real-world conversation logs and distilled into interpretable rubrics, and (2) Tailored Response Generation, which leverages these rubrics through two methods: a) Context-Tuned Inference (GAP-CT), that dynamically adjusts responses via context-dependent prompt instructions, and b) Rubric-Finetuning Inference (GPA-FT), which uses the rubrics to generate contrastive synthetic data for personalization of group-specific models via alignment. Experiments demonstrate that our framework significantly improves alignment of the output with respect to user preferences and outperforms baseline methods, while maintaining robust performance on standard benchmarks.
Problem

Research questions and friction points this paper is trying to address.

LLMs fail to meet specialized group needs
Limited research on group-specific personalization expectations
Propose Group Preference Alignment for tailored LLM responses
Innovation

Methods, ideas, or system contributions that make the work stand out.

Group-Aware Preference Extraction from conversation logs
Context-Tuned Inference for dynamic response adjustment
Rubric-Finetuning Inference for group-specific model personalization
🔎 Similar Papers
No similar papers found.