Deceive, Detect, and Disclose: Large Language Models Play Mini-Mafia

📅 2025-09-26

📈 Citations: 0

✨ Influential: 0

career value

206K/year

🤖 AI Summary

This work evaluates the social intelligence of large language models (LLMs) in strategic social games characterized by information asymmetry and requiring theory-of-mind reasoning. To this end, we introduce Mini-Mafia—a minimal four-player variant of the Mafia game—designed to benchmark deception detection, role inference, and dynamic strategic interaction. Methodologically, we propose a novel, annotation-free, two-stage evaluation framework based solely on LLM-vs.-LLM play: Stage 1 determines outcomes according to role-specific win conditions; Stage 2 enables benchmark evolution via self-generated gameplay. Our contributions are: (1) the first LLM-only interactive benchmark dedicated to social intelligence; (2) empirical evidence of non-monotonic scaling—smaller models outperform larger ones on specific theory-of-mind tasks; (3) quantitative attribution analysis of behavioral biases (e.g., name bias, speaking-order effects); and (4) scalable deception-detection training data and human performance baselines, supporting AI safety research.

Technology Category

Application Category

📝 Abstract

Mafia is a social deduction game where informed mafia compete against uninformed townsfolk. Its asymmetry of information and reliance on theory-of-mind reasoning mirror real-world multi-agent scenarios, making it a useful testbed for evaluating the social intelligence of large language models (LLMs). To support a systematic study, we introduce Mini-Mafia: a simplified four-player variant with one mafioso, one detective, and two villagers. We set the mafioso to kill a villager and the detective to investigate the mafioso during the night, reducing the game to a single day phase of discussion and voting. This setup isolates three interactive capabilities through role-specific win conditions: the mafioso must deceive, the villagers must detect deception, and the detective must effectively disclose information. To measure these skills, we have LLMs play against each other, creating the Mini-Mafia Benchmark: a two-stage framework that first estimates win rates within fixed opponent configurations, then aggregates performance across them using standardized scoring. Built entirely from model interactions without external data, the benchmark evolves as new models are introduced, with each one serving both as a new opponent and as a subject of evaluation. Our experiments reveal counterintuitive results, including cases where smaller models outperform larger ones. Beyond benchmarking, Mini-Mafia enables quantitative study of emergent multi-agent dynamics such as name bias and last-speaker advantage. It also contributes to AI safety by generating training data for deception detectors and by tracking models' deception capabilities against human baselines.

Problem

Research questions and friction points this paper is trying to address.

Evaluating LLMs' social intelligence through deception games

Developing a benchmark for multi-agent interaction capabilities

Measuring emergent dynamics like name bias in LLMs

Innovation

Methods, ideas, or system contributions that make the work stand out.

Simplified four-player Mafia game isolates interactive capabilities

Two-stage framework benchmarks models as opponents and subjects

Generates training data for deception detectors and tracks capabilities

🔎 Similar Papers

No similar papers found.