An Adversary-Resistant Multi-Agent LLM System via Credibility Scoring

📅 2025-05-30

📈 Citations: 0

✨ Influential: 0

career value

208K/year

🤖 AI Summary

Multi-agent large language model (LLM) systems are vulnerable to adversarial or low-performing agents, leading to unreliable outputs. To address this, we propose the first adversarial-robust framework for such systems, modeling collaborative question answering as an iterative game. Our method innovatively introduces a history-aware mechanism that adaptively learns agent trustworthiness and performs trust-weighted output aggregation. It integrates four key components: trustworthiness modeling, game-theoretic collaboration, robust aggregation, and adversarial training-based evaluation. Extensive experiments across diverse tasks and settings demonstrate that our framework significantly mitigates malicious interference, improving both accuracy and output stability. Notably, it maintains high performance even under extreme conditions where adversarial agents constitute over 50% of the system—marking the first empirical validation of robustness in majority-adversary regimes.

Technology Category

Application Category

📝 Abstract

While multi-agent LLM systems show strong capabilities in various domains, they are highly vulnerable to adversarial and low-performing agents. To resolve this issue, in this paper, we introduce a general and adversary-resistant multi-agent LLM framework based on credibility scoring. We model the collaborative query-answering process as an iterative game, where the agents communicate and contribute to a final system output. Our system associates a credibility score that is used when aggregating the team outputs. The credibility scores are learned gradually based on the past contributions of each agent in query answering. Our experiments across multiple tasks and settings demonstrate our system's effectiveness in mitigating adversarial influence and enhancing the resilience of multi-agent cooperation, even in the adversary-majority settings.

Problem

Research questions and friction points this paper is trying to address.

Multi-agent LLM systems vulnerable to adversarial agents

Need adversary-resistant framework via credibility scoring

Enhancing resilience in adversary-majority settings

Innovation

Methods, ideas, or system contributions that make the work stand out.

Adversary-resistant multi-agent LLM framework

Credibility scoring for agent aggregation

Iterative game-based collaborative query-answering

🔎 Similar Papers

Adversarial Multi-Agent Evaluation of Large Language Models through Iterative Debates