Bayesian Social Deduction with Graph-Informed Language Models

📅 2025-06-21

📈 Citations: 0

✨ Influential: 0

career value

196K/year

🤖 AI Summary

Large language models (LLMs) exhibit significant limitations in social reasoning tasks—particularly inferring unobservable beliefs and intentions of others—as evidenced by poor performance in social deduction games like *The Resistance: Avalon*. Existing approaches rely on costly test-time inference and suffer severe performance degradation under model compression. To address this, we propose a hybrid framework integrating graph-aware language modeling with Bayesian social reasoning: an LLM handles natural language understanding and interaction, while a structured probabilistic model explicitly represents and dynamically updates multi-agent belief states. This design enables the first lightweight language agent to outperform human players in *Avalon* (67% win rate), matching the performance of substantially larger models and overcoming key inference bottlenecks for compact architectures. We publicly release all code, models, and datasets, establishing a new paradigm for interpretable, verifiable social reasoning research.

Technology Category

Application Category

📝 Abstract

Social reasoning - inferring unobservable beliefs and intentions from partial observations of other agents - remains a challenging task for large language models (LLMs). We evaluate the limits of current reasoning language models in the social deduction game Avalon and find that while the largest models demonstrate strong performance, they require extensive test-time inference and degrade sharply when distilled to smaller, real-time-capable variants. To address this, we introduce a hybrid reasoning framework that externalizes belief inference to a structured probabilistic model, while using an LLM for language understanding and interaction. Our approach achieves competitive performance with much larger models in Agent-Agent play and, notably, is the first language agent to defeat human players in a controlled study - achieving a 67% win rate and receiving higher qualitative ratings than both reasoning baselines and human teammates. We release code, models, and a dataset to support future work on social reasoning in LLM agents, which can be found at https://camp-lab-purdue.github.io/bayesian-social-deduction/

Problem

Research questions and friction points this paper is trying to address.

Social reasoning challenges in large language models

Performance degradation in distilled real-time models

Hybrid framework for belief inference and interaction

Innovation

Methods, ideas, or system contributions that make the work stand out.

Hybrid reasoning framework for social deduction

Graph-informed Bayesian belief inference

LLM for language understanding and interaction

🔎 Similar Papers

LLM-Enhanced User-Item Interactions: Leveraging Edge Information for Optimized Recommendations