🤖 AI Summary
To address the vulnerability of large language model (LLM)-driven multi-agent systems (MAS) to adversarial attacks, high decision uncertainty, and hallucination propagation—particularly in safety-critical domains such as aerospace—this paper proposes the first randomized smoothing defense framework tailored for MAS consensus scenarios. Operating under black-box assumptions, the method employs a two-stage adaptive sampling mechanism to achieve statistically certified robustness, providing provable probabilistic safety guarantees for individual agent decisions without requiring model gradients. Crucially, it pioneers the integration of randomized smoothing into MAS consensus dynamics, effectively disrupting cascading propagation of adversarial perturbations and hallucinations. Experiments demonstrate that the approach significantly enhances adversarial robustness while preserving high consensus accuracy and computational efficiency. This yields a scalable, formally verifiable solution for reliable deployment of LLM-based MAS in high-risk applications.
📝 Abstract
This paper presents a defense framework for enhancing the safety of large language model (LLM) empowered multi-agent systems (MAS) in safety-critical domains such as aerospace. We apply randomized smoothing, a statistical robustness certification technique, to the MAS consensus context, enabling probabilistic guarantees on agent decisions under adversarial influence. Unlike traditional verification methods, our approach operates in black-box settings and employs a two-stage adaptive sampling mechanism to balance robustness and computational efficiency. Simulation results demonstrate that our method effectively prevents the propagation of adversarial behaviors and hallucinations while maintaining consensus performance. This work provides a practical and scalable path toward safe deployment of LLM-based MAS in real-world, high-stakes environments.