NICE: A Theory-Grounded Diagnostic Benchmark for Social Intelligence of LLMs

📅 2026-05-28
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Current evaluations of social intelligence in large language models lack fine-grained diagnostic capabilities grounded in a unified theoretical framework. To address this gap, this work integrates psychological and social theories to construct a comprehensive social intelligence framework comprising four major categories and eleven dimensions. Building upon this framework, the authors propose NICE—the first structured benchmark tailored to Chinese-language contexts—encompassing 137 fine-grained capability indicators. NICE uniquely combines psychometric principles with social theory, ensuring validity through systematic literature review, multi-stage expert validation, and contextualized task design. Experiments across five state-of-the-art large language models and a human control group demonstrate that NICE effectively identifies three systematic weaknesses in the Communication dimension: multi-turn interaction, nonverbal communication, and synchrony.
📝 Abstract
As large language models (LLMs) are increasingly applied in social contexts such as emotional companionship and customer service, measuring their social intelligence has become critical to the quality and safety of human-AI interaction. However, existing social intelligence benchmarks lack a unified framework that organizes social abilities into a unified structure, and therefore cannot enable fine-grained diagnosis. To build the first holistic diagnostic evaluation grounded in social theory, we first construct a social intelligence framework through a literature review and multi-stage expert validation guided by psychometric principles. The resulting framework includes 4 categories and 11 dimensions, each further specified by fine-grained capability facets. Building on this framework, we introduce NICE (Norm, Interaction, Cognition, Experience), a diagnostic benchmark of 137 items operationalized through representative Chinese contexts. Across 5 frontier LLMs and a human reference group, models score higher in aggregate accuracy yet show a consistent weakness in Communication, which the framework localizes to 3 specific capability facets: multi-turn communication, nonverbal communication, and synchrony. NICE thus reframes social intelligence evaluation toward theory-grounded diagnosis of socially consequential weaknesses in LLMs.
Problem

Research questions and friction points this paper is trying to address.

social intelligence
large language models
diagnostic benchmark
human-AI interaction
evaluation framework
Innovation

Methods, ideas, or system contributions that make the work stand out.

social intelligence
diagnostic benchmark
theory-grounded evaluation
large language models
psychometric validation
Y
Yunjin Qi
Department of Psychology and Behavioral Sciences, Zhejiang University
Z
Zhaojun Jiang
College of Artificial Intelligence, Zhejiang University
X
Xuan Wu
Department of Psychology and Behavioral Sciences, Zhejiang University
H
Hanxi Pan
Department of Psychology and Behavioral Sciences, Zhejiang University
Y
Yixuan Wang
Human Machine Interaction Lab, Huawei Technologies Co., Ltd.
Yanfang Liu
Yanfang Liu
Assistant Professor, Middle Tennessee State University
Generative Diffusion ModelsInverse ProblemsBayesian Inversion
X
Xiang Ji
Human Machine Interaction Lab, Huawei Technologies Co., Ltd.
C
Churu Yu
Department of Psychology and Behavioral Sciences, Zhejiang University
Chunyuan Zheng
Chunyuan Zheng
Zhejiang University
Y
Yingze Chen
Department of Psychology and Behavioral Sciences, Zhejiang University
J
Jie He
Department of Psychology and Behavioral Sciences, Zhejiang University; Zhejiang Key Laboratory of Neurocognitive Development and Mental Health
Liuqing Chen
Liuqing Chen
ZJU-100 Young Professor at Zhejiang University
AI-driven designHCIHRIAI applicationscreativity
Z
Zaifeng Gao
Department of Psychology and Behavioral Sciences, Zhejiang University; Zhejiang Key Laboratory of Neurocognitive Development and Mental Health