Diagnose, Localize, Align: A Full-Stack Framework for Reliable LLM Multi-Agent Systems under Instruction Conflicts

📅 2025-09-27

📈 Citations: 0

✨ Influential: 0

career value

205K/year

🤖 AI Summary

Large language model (LLM)-based multi-agent systems (MAS) suffer from reliability deficits under instruction conflicts—e.g., system-user or peer-peer contradictions—due to misaligned hierarchical compliance: agents erroneously prioritize system-level constraints over user intent, and macroscopic metrics (e.g., pass@k) fail to expose such fine-grained violations. Method: We propose a three-stage full-stack framework: (i) *Diagnosis*, via context-aware Role-following Score (CRAS); (ii) *Localization*, identifying attention drift concentrated in intermediate transformer layers; and (iii) *Alignment*, using SAIL—a lightweight method that fine-tunes only critical layers. SAIL integrates LoRA-based low-rank adaptation with token-weighted DPO for instruction-level alignment without full-model finetuning. Contribution/Results: Evaluated on AutoGen and MedQA, SAIL improves instruction-following rate by 5.60%, significantly mitigating erroneous system-rule prioritization while preserving efficiency and scalability.

Technology Category

Application Category

📝 Abstract

Large Language Model (LLM)-powered multi-agent systems (MAS) have rapidly advanced collaborative reasoning, tool use, and role-specialized coordination in complex tasks. However, reliability-critical deployment remains hindered by a systemic failure mode: hierarchical compliance under instruction conflicts (system-user, peer-peer), where agents misprioritize system-level rules in the presence of competing demands. Moreover, widely used macro-level metrics (e.g., pass@k) obscure these micro-level violations and offer little actionable guidance for remedy. In this work, we present a full-stack, three-stage framework: (1) Diagnose - Contextualized Role Adherence Score (CRAS), a query-wise, context-aware scoring metric that decomposes role adherence into four measurable dimensions; (2) Localize - attention drift analysis revealing that instruction conflicts are resolved by attention heads that are largely concentrated in middle layers; (3) Align - Surgical Alignment of Instruction Layers (SAIL), which installs LoRA only on the localized focal layers and optimizes a token-weighted DPO-style preference objective that credits tokens by their focal attentional contribution. Across standard benchmarks and MAS frameworks, our surgical approach improves instruction hierarchy compliance (e.g., +5.60% with AutoGen on MedQA) without full-model finetuning.

Problem

Research questions and friction points this paper is trying to address.

Diagnosing role adherence failures in multi-agent systems

Localizing attention drift causing instruction conflict resolution

Aligning focal layers to improve compliance without full fine-tuning

Innovation

Methods, ideas, or system contributions that make the work stand out.

CRAS metric measures role adherence dimensions

Attention drift analysis locates conflict resolution layers

SAIL method surgically aligns focal layers with LoRA

🔎 Similar Papers

Derailer-Rerailer: Adaptive Verification for Efficient and Reliable Language Model Reasoning