AEGIS: Automated Error Generation and Identification for Multi-Agent Systems

📅 2025-09-16
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Reliability assessment of multi-agent systems (MAS) is hindered by the absence of large-scale, high-quality benchmark datasets with precise and diverse error annotations. Method: We propose the first automated error generation and identification framework tailored for MAS. Leveraging large language models (LLMs), our context-aware manipulator controllably injects traceable, customizable errors—including prompt injection and response tampering—into successful agent trajectories, enabling systematic construction of a rich fault dataset. The framework supports diverse modeling paradigms, including supervised fine-tuning, reinforcement learning, and contrastive learning, for error recognition. Results: Empirical evaluation demonstrates substantial improvements in error detection performance across multiple scenarios. Notably, lightweight fine-tuned models achieve accuracy comparable to—or even surpassing—that of closed-source systems an order of magnitude larger. Our framework establishes a reproducible, scalable paradigm for MAS safety verification.

Technology Category

Application Category

📝 Abstract
As Multi-Agent Systems (MAS) become increasingly autonomous and complex, understanding their error modes is critical for ensuring their reliability and safety. However, research in this area has been severely hampered by the lack of large-scale, diverse datasets with precise, ground-truth error labels. To address this bottleneck, we introduce extbf{AEGIS}, a novel framework for extbf{A}utomated extbf{E}rror extbf{G}eneration and extbf{I}dentification for Multi-Agent extbf{S}ystems. By systematically injecting controllable and traceable errors into initially successful trajectories, we create a rich dataset of realistic failures. This is achieved using a context-aware, LLM-based adaptive manipulator that performs sophisticated attacks like prompt injection and response corruption to induce specific, predefined error modes. We demonstrate the value of our dataset by exploring three distinct learning paradigms for the error identification task: Supervised Fine-Tuning, Reinforcement Learning, and Contrastive Learning. Our comprehensive experiments show that models trained on AEGIS data achieve substantial improvements across all three learning paradigms. Notably, several of our fine-tuned models demonstrate performance competitive with or superior to proprietary systems an order of magnitude larger, validating our automated data generation framework as a crucial resource for developing more robust and interpretable multi-agent systems. Our project website is available at https://kfq20.github.io/AEGIS-Website.
Problem

Research questions and friction points this paper is trying to address.

Lack of large-scale labeled error datasets for multi-agent systems
Automated generation of realistic failure scenarios with traceable errors
Improving error identification through diverse learning paradigms on synthetic data
Innovation

Methods, ideas, or system contributions that make the work stand out.

Automated error generation via LLM manipulator
Systematic injection of traceable realistic failures
Training models with generated error dataset
🔎 Similar Papers
No similar papers found.