🤖 AI Summary
Large language models (LLMs) exhibit significant limitations in social reasoning tasks—particularly inferring unobservable beliefs and intentions of others—as evidenced by poor performance in social deduction games like *The Resistance: Avalon*. Existing approaches rely on costly test-time inference and suffer severe performance degradation under model compression. To address this, we propose a hybrid framework integrating graph-aware language modeling with Bayesian social reasoning: an LLM handles natural language understanding and interaction, while a structured probabilistic model explicitly represents and dynamically updates multi-agent belief states. This design enables the first lightweight language agent to outperform human players in *Avalon* (67% win rate), matching the performance of substantially larger models and overcoming key inference bottlenecks for compact architectures. We publicly release all code, models, and datasets, establishing a new paradigm for interpretable, verifiable social reasoning research.
📝 Abstract
Social reasoning - inferring unobservable beliefs and intentions from partial observations of other agents - remains a challenging task for large language models (LLMs). We evaluate the limits of current reasoning language models in the social deduction game Avalon and find that while the largest models demonstrate strong performance, they require extensive test-time inference and degrade sharply when distilled to smaller, real-time-capable variants. To address this, we introduce a hybrid reasoning framework that externalizes belief inference to a structured probabilistic model, while using an LLM for language understanding and interaction. Our approach achieves competitive performance with much larger models in Agent-Agent play and, notably, is the first language agent to defeat human players in a controlled study - achieving a 67% win rate and receiving higher qualitative ratings than both reasoning baselines and human teammates. We release code, models, and a dataset to support future work on social reasoning in LLM agents, which can be found at https://camp-lab-purdue.github.io/bayesian-social-deduction/