ResMAS: Resilience Optimization in LLM-based Multi-agent Systems

📅 2026-01-08
🏛️ arXiv.org
📈 Citations: 1
Influential: 0
📄 PDF
🤖 AI Summary
Current large language model (LLM)-based multi-agent systems lack intrinsic resilience against perturbations such as agent failures and rely predominantly on post-hoc defenses, hindering their ability to maintain stable, continuous operation. To address this limitation, this work proposes ResMAS, a novel framework that jointly optimizes communication topology design and prompt engineering to enhance system resilience. Specifically, ResMAS first employs reinforcement learning to automatically generate task-oriented communication topologies and then leverages these topologies to refine agent-specific prompts, enabling proactive robustness against disturbances. Experimental results demonstrate that the proposed approach significantly improves system resilience across diverse tasks and operational constraints, while also exhibiting strong generalization capabilities to unseen tasks and new LLMs.

Technology Category

Application Category

📝 Abstract
Large Language Model-based Multi-Agent Systems (LLM-based MAS), where multiple LLM agents collaborate to solve complex tasks, have shown impressive performance in many areas. However, MAS are typically distributed across different devices or environments, making them vulnerable to perturbations such as agent failures. While existing works have studied the adversarial attacks and corresponding defense strategies, they mainly focus on reactively detecting and mitigating attacks after they occur rather than proactively designing inherently resilient systems. In this work, we study the resilience of LLM-based MAS under perturbations and find that both the communication topology and prompt design significantly influence system resilience. Motivated by these findings, we propose ResMAS: a two-stage framework for enhancing MAS resilience. First, we train a reward model to predict the MAS's resilience, based on which we train a topology generator to automatically design resilient topology for specific tasks through reinforcement learning. Second, we introduce a topology-aware prompt optimization method that refines each agent's prompt based on its connections and interactions with other agents. Extensive experiments across a range of tasks show that our approach substantially improves MAS resilience under various constraints. Moreover, our framework demonstrates strong generalization ability to new tasks and models, highlighting its potential for building resilient MASs.
Problem

Research questions and friction points this paper is trying to address.

resilience
LLM-based multi-agent systems
perturbations
agent failures
system robustness
Innovation

Methods, ideas, or system contributions that make the work stand out.

resilience optimization
LLM-based multi-agent systems
topology generation
prompt engineering
reinforcement learning
🔎 Similar Papers
No similar papers found.
Zhilun Zhou
Zhilun Zhou
Tsinghua University
urban computing
Z
Zihan Liu
Department of Electronic Engineering, BNRist, Tsinghua University
J
Jiahe Liu
Department of Electronic Engineering, BNRist, Tsinghua University
Q
Qingyu Shao
Department of Electronic Engineering, BNRist, Tsinghua University
Y
Yihan Wang
Department of Electronic Engineering, BNRist, Tsinghua University
Kun Shao
Kun Shao
Huawei
AI Agentreinforcement learningmulti-agent systemsembodied AIgame AI
D
Depeng Jin
Department of Electronic Engineering, BNRist, Tsinghua University
Fengli Xu
Fengli Xu
Tsinghua University
LLM AgentData ScienceSocial ComputingScience of ScienceUrban Science