ResMAS: Resilience Optimization in LLM-based Multi-agent Systems

📅 2026-01-08

🏛️ arXiv.org

📈 Citations: 1

✨ Influential: 0

career value

227K/year

🤖 AI Summary

Current large language model (LLM)-based multi-agent systems lack intrinsic resilience against perturbations such as agent failures and rely predominantly on post-hoc defenses, hindering their ability to maintain stable, continuous operation. To address this limitation, this work proposes ResMAS, a novel framework that jointly optimizes communication topology design and prompt engineering to enhance system resilience. Specifically, ResMAS first employs reinforcement learning to automatically generate task-oriented communication topologies and then leverages these topologies to refine agent-specific prompts, enabling proactive robustness against disturbances. Experimental results demonstrate that the proposed approach significantly improves system resilience across diverse tasks and operational constraints, while also exhibiting strong generalization capabilities to unseen tasks and new LLMs.

Technology Category

Application Category

📝 Abstract

Large Language Model-based Multi-Agent Systems (LLM-based MAS), where multiple LLM agents collaborate to solve complex tasks, have shown impressive performance in many areas. However, MAS are typically distributed across different devices or environments, making them vulnerable to perturbations such as agent failures. While existing works have studied the adversarial attacks and corresponding defense strategies, they mainly focus on reactively detecting and mitigating attacks after they occur rather than proactively designing inherently resilient systems. In this work, we study the resilience of LLM-based MAS under perturbations and find that both the communication topology and prompt design significantly influence system resilience. Motivated by these findings, we propose ResMAS: a two-stage framework for enhancing MAS resilience. First, we train a reward model to predict the MAS's resilience, based on which we train a topology generator to automatically design resilient topology for specific tasks through reinforcement learning. Second, we introduce a topology-aware prompt optimization method that refines each agent's prompt based on its connections and interactions with other agents. Extensive experiments across a range of tasks show that our approach substantially improves MAS resilience under various constraints. Moreover, our framework demonstrates strong generalization ability to new tasks and models, highlighting its potential for building resilient MASs.

Problem

Research questions and friction points this paper is trying to address.

resilience

LLM-based multi-agent systems

perturbations

agent failures

system robustness

Innovation

Methods, ideas, or system contributions that make the work stand out.

resilience optimization

LLM-based multi-agent systems

topology generation