🤖 AI Summary
This work addresses the limitation of existing language agent evaluation environments, which assume idealized communication and fail to capture communicative barriers arising from cognitive differences. The authors propose SocialVeil, a novel social learning environment that systematically models three realistic obstacles: semantic ambiguity, sociocultural mismatch, and emotional interference. To quantify interaction quality, they introduce two metrics—“unresolved confusion” and “mutual understanding.” Drawing on literature-derived mechanisms to simulate these barriers, the framework integrates repair instructions and interactive learning strategies. Human evaluations and automatic metrics (ICC ≈ 0.78, Pearson’s r ≈ 0.80) validate its effectiveness. Evaluations across 720 scenarios with four state-of-the-art large language models reveal that communication barriers reduce mutual understanding by over 45% and increase confusion by nearly 50%, while current adaptation strategies show limited efficacy, underscoring a significant gap in models’ capacity to navigate authentic social interactions.
📝 Abstract
Large language models (LLMs) are increasingly evaluated in interactive environments to test their social intelligence. However, existing benchmarks often assume idealized communication between agents, limiting our ability to diagnose whether LLMs can maintain and repair interactions in more realistic, imperfect settings. To close this gap, we present \textsc{SocialVeil}, a social learning environment that can simulate social interaction under cognitive-difference-induced communication barriers. Grounded in a systematic literature review of communication challenges in human interaction, \textsc{SocialVeil} introduces three representative types of such disruption, \emph{semantic vagueness}, \emph{sociocultural mismatch}, and \emph{emotional interference}. We also introduce two barrier-aware evaluation metrics, \emph{unresolved confusion} and \emph{mutual understanding}, to evaluate interaction quality under impaired communication. Experiments across 720 scenarios and four frontier LLMs show that barriers consistently impair performance, with mutual understanding reduced by over 45\% on average, and confusion elevated by nearly 50\%. Human evaluations validate the fidelity of these simulated barriers (ICC$\approx$0.78, Pearson r$\approx$0.80). We further demonstrate that adaptation strategies (Repair Instruction and Interactive learning) only have a modest effect far from barrier-free performance. This work takes a step toward bringing social interaction environments closer to real-world communication, opening opportunities for exploring the social intelligence of LLM agents.