🤖 AI Summary
This study investigates whether social stereotypes can spontaneously emerge in large language model (LLM)-driven multi-agent systems—without initial bias or reliance on biased training data. We construct a hierarchical workplace simulation environment and conduct multi-round interactive experiments, complemented by quantitative analysis. Results show that AI agents develop canonical social biases—including the halo effect, confirmation bias, and role-congruency bias—even under neutral initialization. Our key contributions are twofold: first, we provide the first empirical evidence that stereotypes arise as an *emergent property* of multi-agent interaction, consistently across diverse LLM architectures; second, we demonstrate that centralized decision-making authority and organizational hierarchy significantly amplify bias emergence. These findings indicate that sociocognitive biases may be intrinsically embedded in the coordination mechanisms of multi-agent systems, revealing a novel, architecture-level source of AI bias distinct from data- or model-level origins.
📝 Abstract
While stereotypes are well-documented in human social interactions, AI systems are often presumed to be less susceptible to such biases. Previous studies have focused on biases inherited from training data, but whether stereotypes can emerge spontaneously in AI agent interactions merits further exploration. Through a novel experimental framework simulating workplace interactions with neutral initial conditions, we investigate the emergence and evolution of stereotypes in LLM-based multi-agent systems. Our findings reveal that (1) LLM-Based AI agents develop stereotype-driven biases in their interactions despite beginning without predefined biases; (2) stereotype effects intensify with increased interaction rounds and decision-making power, particularly after introducing hierarchical structures; (3) these systems exhibit group effects analogous to human social behavior, including halo effects, confirmation bias, and role congruity; and (4) these stereotype patterns manifest consistently across different LLM architectures. Through comprehensive quantitative analysis, these findings suggest that stereotype formation in AI systems may arise as an emergent property of multi-agent interactions, rather than merely from training data biases. Our work underscores the need for future research to explore the underlying mechanisms of this phenomenon and develop strategies to mitigate its ethical impacts.