Enhancing Value Alignment of LLMs with Multi-agent system and Combinatorial Fusion

📅 2026-03-11

📈 Citations: 0

✨ Influential: 0

career value

195K/year

🤖 AI Summary

This work addresses the limitation of current value alignment approaches for large language models, which often rely on a single evaluator or narrow reward signals and thus fail to capture ethical pluralism. To overcome this, the authors propose a multi-agent framework in which each agent embodies a distinct normative perspective. They introduce, for the first time, a Compositional Fusion Analysis (CFA) mechanism that integrates multi-agent fine-tuning with a dual aggregation strategy combining ranking and scoring. This approach effectively mitigates value conflicts and redundancies inherent in diverse ethical viewpoints. Experimental results demonstrate that the proposed method significantly outperforms single-agent baselines and existing aggregation techniques across standard metrics, thereby enhancing the alignment of large language models with multifaceted ethical dimensions.

Technology Category

Application Category

📝 Abstract

Aligning large language models (LLMs) with human values is a central challenge for ensuring trustworthy and safe deployment. While existing methods such as Reinforcement Learning from Human Feedback (RLHF) and its variants have improved alignment, they often rely on a single evaluator or narrowly defined reward signals, limiting their ability to capture ethical pluralism. In this work, we propose the Value Alignment System using Combinatorial Fusion Analysis (VAS-CFA), a framework that operationalizes multi-agent fusion alignment. It instantiates multiple moral agents, each fine-tuned to represent a distinct normative perspective, and fuses their outputs using CFA with both rank- and score-based aggregation. This design leverages cognitive diversity, between agents, to mitigate conflicts and redundancies across multiple agents, producing responses that better reflect human values. Empirical evaluation demonstrates that VAS-CFA outperforms both single agent baselines and prior aggregation approaches on standard metrics, showing that multi-agent fusion provides a robust and effective mechanism for advancing value alignment in LLMs.

Problem

Research questions and friction points this paper is trying to address.

Value Alignment

Large Language Models

Ethical Pluralism

Multi-agent System

Human Values

Innovation

Methods, ideas, or system contributions that make the work stand out.

multi-agent system

combinatorial fusion analysis

value alignment