π€ AI Summary
This study addresses the lack of an operational definition and systematic evaluation framework for social intelligence in both humans and artificial agents. The authors propose COMPACT (Communicate-Predict-Act), a multi-agent hybrid game framework integrated with fine-grained probes of social dynamics, establishing the first testable, multidimensional assessment system for social intelligence. They introduce novel sociocognitive metrics to quantify capabilities such as action prediction, communicative influence, and strategic reasoning, revealing that influence, transparency, and adaptability better account for social intelligence performance than theory of mind alone. Experiments across eight large language models of varying scales demonstrate that these metrics exhibit high internal consistency (AUC ROC = 0.82), effectively predict winβloss outcomes, and are validated through Elo ratings, behavioral trajectory analysis, and feature importance assessments.
π Abstract
As large language model (LLM) agents become more prevalent in real world social settings, social intelligence will play an increasingly critical role. But social intelligence is still a poorly defined construct, for humans and artificial agents. We introduce a multiplayer arena of mixed cooperative and competitive social games to study LLM social intelligence. The controllability of LLM based agents enables systematic evaluation, which also supports broader inferences about social intelligence per se. We evaluated eight diverse LLMs (24B to 1T parameters) using a Communicate Predict Act (COMPACT) interaction protocol and fine grained probing of social dynamics. Elo style ratings reveal consistent performance differences across models, but this scalar measure provides only a partial characterization of social intelligence. To address this limitation, we analyze gameplay traces to extract sociocognitive metrics capturing action prediction, communicative influence, strategic reasoning, and tradeoffs under conflicting interests. These sociocognitive metrics exhibit strong intramodel consistency and they reliably predict pairwise agent advantage in game outcomes (AUC ROC = 0.82). Feature importance analysis indicates that surprisingly, influence, transparency, and adaptability are more predictive of success than Theory of Mind inference or deep planning. Together, our results advance a testable, multidimensional conception of social intelligence and provide empirical insights into the capacities that underpin it.