🤖 AI Summary
This work addresses the limitation of current value alignment approaches for large language models, which often rely on a single evaluator or narrow reward signals and thus fail to capture ethical pluralism. To overcome this, the authors propose a multi-agent framework in which each agent embodies a distinct normative perspective. They introduce, for the first time, a Compositional Fusion Analysis (CFA) mechanism that integrates multi-agent fine-tuning with a dual aggregation strategy combining ranking and scoring. This approach effectively mitigates value conflicts and redundancies inherent in diverse ethical viewpoints. Experimental results demonstrate that the proposed method significantly outperforms single-agent baselines and existing aggregation techniques across standard metrics, thereby enhancing the alignment of large language models with multifaceted ethical dimensions.
📝 Abstract
Aligning large language models (LLMs) with human values is a central challenge for ensuring trustworthy and safe deployment. While existing methods such as Reinforcement Learning from Human Feedback (RLHF) and its variants have improved alignment, they often rely on a single evaluator or narrowly defined reward signals, limiting their ability to capture ethical pluralism. In this work, we propose the Value Alignment System using Combinatorial Fusion Analysis (VAS-CFA), a framework that operationalizes multi-agent fusion alignment. It instantiates multiple moral agents, each fine-tuned to represent a distinct normative perspective, and fuses their outputs using CFA with both rank- and score-based aggregation. This design leverages cognitive diversity, between agents, to mitigate conflicts and redundancies across multiple agents, producing responses that better reflect human values. Empirical evaluation demonstrates that VAS-CFA outperforms both single agent baselines and prior aggregation approaches on standard metrics, showing that multi-agent fusion provides a robust and effective mechanism for advancing value alignment in LLMs.